Question: Assignment One This assignment expects you to make a use of multiple machine learning algorithms to make predictions from the following datasets. Refer to our

Assignment One
This assignment expects you to make a use of multiple machine learning algorithms
to make predictions from the following datasets. Refer to our lecture notes and
practicals to use appropriate algorithms as per the outline defined.
Datasets
The dataset "film_collection_dataset.csv" contains information about movies and
their marketing, production expense, budget of the movie, length of the movie, critic
rating etc and the money earned.
The dataset "loan_dataset.csv" contains people's personal information and a
classification field "loan_status" states whether or not their request to loan was
approved based on their education, income and credit score.
The dataset "marketing_campaign_dataset.csv" contains data about people's
education, marital status, income, number of kids in the household etc and their
preferences to multiple products and their binary response (acceptance/rejection) to
multiple offers made in campaigns (from columns AcceptedCmp1 to AcceptedCmp2
and the response column). The dataset also contains information about the amount of
money spent on products such as Gold, Fruits, Meat, Fish, Sweets and Wines in the
last two years.
Outline
Create optimum training/testing split to form appropriate machine learning
models for both classification and regression problems and also make a use of
cross validation methods to avoid model overfitting problems.
Achieve necessary data pre-processing steps including outlier removals and
appropriate visualisation steps such as pair plots or correlation matrix to better
understand the data distribution.
Create Linear and Multiple regression models to predict the revenue of movies
by proposing unseen input data by keeping in mind the concept of
multicollinearity. Also calculate the coefficient of determination squared). Perform the analysis with and without data standardisation to
differentiate the prediction effect.
Make a decision tree model (for a regression problem) using optimum training/
testing split and calculate the Mean Squared Error (MSE) to check the model's
accuracy. Also predict some unseen movies data and compare the model's
accuracy against the regression model to find out which model performs better.
Train the Logistic Regression and Decision Tree models with optimum
train/test split for solving classification problems using GridSearch
Hyperparameter tuning to predict whether or not a loan of a certain profiles of
individuals would be approved. Also employ the Random Forest classification
model to predict the class of the same unseen data (calculating the accuracy of
the model) and compare the results with the Logistic Regression and Decision
Tree models and evaluate your analysis.
Use the same classification models for the Marketing campaign dataset and
predict whether individual profiles with certain characteristics (such as marital
status, income or education level) is likely to respond to the campaigns made.
Also use the K-means clustering algorithm to identify clusters of people with
certain characteristics (such as education, marital status or income level) and
the money they spent on products like Gold, Fruits, Meat, Fish, Sweets and
Wines etc.
In your report, show appropriate visualisations, confusion matrix and
classification report for each classification model wherever necessary.
Note: Students are allowed to structure the report as they find appropriate. Please use
the IEEE referencing style in your work to cite appropriate sources.
Assignment One This assignment expects you to

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!