Question: In this assignment, you will work with a retail sales dataset ( provided in csv file format ) to perform not only data analysis but
In this assignment, you will work with a retail sales datasetprovided in csv file format to perform not only data analysis but also predictive modeling. You will explore the data, handle missing values, and build a machine learning model to predict key outcomes. Finally, you will save the trained model and integrate it into a web or mobile application. Use Anaconda and Jupyter
Dataset Columns:
invoiceid: Unique identifier for each transaction.
branch: The branch where the transaction took place.
city: The city in which the branch is located.
customertype: Type of customer eg Member, Nonmember
gendercustomer: Gender of the customer.
productline: Category of the product purchased.
unitcost: Cost per unit of the product.
quantity: Number of units purchased in the transaction.
pctmarkup: The markup applied to the unit cost.
revenue: Total revenue generated from the transaction.
date: The date of the transaction.
time: The time the transaction occurred.
paymentmethod: The payment method used by the customer.
cogs: Cost of Goods Sold COGS for the transaction.
gmpct: Gross Margin Percentage.
grossincome: Gross income from the transaction.
rating: Customer satisfaction rating for the transaction.
Tasks:
Data Cleaning and Preparation:
Imputation: Identify and impute missing values using appropriate techniques eg meanmedian for numerical data, mode for categorical data, or predictive imputation methods
Convert data types if necessary eg ensure date and time columns are in the correct format
Feature engineering: Create any additional features that could enhance the predictive power of your model, such as a "time of day" category derived from the time column.
Descriptive Statistics and Data Exploration:
Calculate summary statistics for numerical columns and explore the distribution of categorical columns.
Create visualizations to explore patterns in the data, such as the distribution of revenue and gross income across branches and product lines.
Explore relationships between key variables, such as the impact of quantity and unit cost on revenue and gross income.
Predictive Modeling:
Model Selection: Choose at least four machine learning models to predict either revenue, grossincome or rating.
Training and Testing: Split the dataset into training and testing sets. Train the model using the training data and evaluate its performance on the testing data.
Hyperparameter Tuning: Optimize your model's performance by tuning hyperparameters using crossvalidation.
Categorization: Use the trained model to categorize transactions into different performance segments eg high revenue vs low revenue transactions
Model Evaluation: Evaluate the model using appropriate metrics RMSE accuracy, precision or F
Saving and Deploying the Model:
Model Saving: Save the trained model to a file using a format like joblib or pickle.
API Development: Develop a REST API using Flask or FastAPI to serve the model, allowing it to be accessed by the application.
Frontend Integration: Incorporate the model's predictions into a web or mobile interface. This could involve displaying predictive insights to users or categorizing transactions in realtime.
Deliverables:
A detailed report covering data cleaning, analysis, and predictive modeling, including visualizations and statistical insights
A Notebook with your code, analysis and comments
The saved model file egjoblib or pkl
Integration with a web or mobile interface.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
