Question: In this assignment, you will work with a retail sales dataset ( provided in csv file format ) to perform not only data analysis but

In this assignment, you will work with a retail sales dataset(provided in csv file format) to perform not only data analysis but also predictive modeling. You will explore the data, handle missing values, and build a machine learning model to predict key outcomes. Finally, you will save the trained model and integrate it into a web or mobile application. (Use Anaconda and Jupyter )
Dataset Columns:
invoice_id: Unique identifier for each transaction.
branch: The branch where the transaction took place.
city: The city in which the branch is located.
customer_type: Type of customer (e.g., Member, Non-member).
gender_customer: Gender of the customer.
product_line: Category of the product purchased.
unit_cost: Cost per unit of the product.
quantity: Number of units purchased in the transaction.
5pct_markup: The 5% markup applied to the unit cost.
revenue: Total revenue generated from the transaction.
date: The date of the transaction.
time: The time the transaction occurred.
payment_method: The payment method used by the customer.
cogs: Cost of Goods Sold (COGS) for the transaction.
gm_pct: Gross Margin Percentage.
gross_income: Gross income from the transaction.
rating: Customer satisfaction rating for the transaction.
Tasks:
1.
Data Cleaning and Preparation:
Imputation: Identify and impute missing values using appropriate techniques (e.g., mean/median for numerical data, mode for categorical data, or predictive imputation methods).
Convert data types if necessary (e.g., ensure date and time columns are in the correct format).
Feature engineering: Create any additional features that could enhance the predictive power of your model, such as a "time of day" category derived from the time column.
2.
Descriptive Statistics and Data Exploration:
Calculate summary statistics for numerical columns and explore the distribution of categorical columns.
Create visualizations to explore patterns in the data, such as the distribution of revenue and gross income across branches and product lines.
Explore relationships between key variables, such as the impact of quantity and unit cost on revenue and gross income.
3.
Predictive Modeling:
Model Selection: Choose at least four machine learning models to predict either revenue, gross_income or rating.
Training and Testing: Split the dataset into training and testing sets. Train the model using the training data and evaluate its performance on the testing data.
Hyperparameter Tuning: Optimize your model's performance by tuning hyperparameters using cross-validation.
Categorization: Use the trained model to categorize transactions into different performance segments (e.g., high revenue vs. low revenue transactions).
Model Evaluation: Evaluate the model using appropriate metrics (RMSE accuracy, precision or F1).
4.
Saving and Deploying the Model:
Model Saving: Save the trained model to a file using a format like joblib or pickle.
API Development: Develop a REST API using Flask or FastAPI to serve the model, allowing it to be accessed by the application.
Frontend Integration: Incorporate the model's predictions into a web or mobile interface. This could involve displaying predictive insights to users or categorizing transactions in real-time.
Deliverables:
A detailed report covering data cleaning, analysis, and predictive modeling, including visualizations and statistical insights
A Notebook with your code, analysis and comments
The saved model file (e.g.,.joblib or .pkl).
Integration with a web or mobile interface.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!