Question: Assignment One This assignment expects you to make a use of multiple machine learning algorithms to make predictions from the following datasets. Refer to our

Assignment One

This assignment expects you to make a use of multiple machine learning algorithms

to make predictions from the following datasets. Refer to our lecture notes and

practicals to use appropriate algorithms as per the outline defined.

Datasets

The dataset "film

_

collection

_

dataset.csv

"

contains information about movies and

their marketing, production expense, budget of the movie, length of the movie, critic

rating etc and the money earned.

The dataset "loan

_

dataset.csv

"

contains people's personal information and a

classification field "loan

_

status" states whether or not their request to loan was

approved based on their education, income and credit score.

The dataset "marketing

_

campaign

_

dataset.csv

"

contains data about people's

education, marital status, income, number of kids in the household etc and their

preferences to multiple products and their binary response

(

acceptance

/

rejection

)

multiple offers made in campaigns

(

from columns AcceptedCmp

1

to AcceptedCmp

2

and the response column

) .

The dataset also contains information about the amount of

money spent on products such as Gold, Fruits, Meat, Fish, Sweets and Wines in the

last two years.

Outline

Create optimum training

/

testing split to form appropriate machine learning

models for both classification and regression problems and also make a use of

cross validation methods to avoid model overfitting problems.

Achieve necessary data pre

-

processing steps including outlier removals and

appropriate visualisation steps such as pair plots or correlation matrix to better

understand the data distribution.

Create Linear and Multiple regression models to predict the revenue of movies

by proposing unseen input data by keeping in mind the concept of

multicollinearity. Also calculate the coefficient of determination squared

) .

Perform the analysis with and without data standardisation to

differentiate the prediction effect.

Make a decision tree model

(

for a regression problem

)

using optimum training

/

testing split and calculate the Mean Squared Error

(

MSE

)

to check the model's

accuracy. Also predict some unseen movies data and compare the model's

accuracy against the regression model to find out which model performs better.

Train the Logistic Regression and Decision Tree models with optimum

train

/

test split for solving classification problems using GridSearch

Hyperparameter tuning to predict whether or not a loan of a certain profiles of

individuals would be approved. Also employ the Random Forest classification

model to predict the class of the same unseen data

(

calculating the accuracy of

the model

)

and compare the results with the Logistic Regression and Decision

Tree models and evaluate your analysis.

Use the same classification models for the Marketing campaign dataset and

predict whether individual profiles with certain characteristics

(

such as marital

status, income or education level

)

is likely to respond to the campaigns made.

Also use the K

-

means clustering algorithm to identify clusters of people with

certain characteristics

(

such as education, marital status or income level

)

and

the money they spent on products like Gold, Fruits, Meat, Fish, Sweets and

Wines etc.

In your report, show appropriate visualisations, confusion matrix and

classification report for each classification model wherever necessary.

Note: Students are allowed to structure the report as they find appropriate. Please use

the IEEE referencing style in your work to cite appropriate sources.

Assignment One This assignment expects you to

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

This assignment expects you to make a use of multiple machine learning algorithms to make predictions from the following datasets. Refer to our lecture notes and practicals to use appropriate...

Please. The solution is paper Q. [10 marks] Assignment and Presentation Download the Medical Datasets from the following website https://www.kaggle.com/datasets/obulisainaren/multi-cancer This...

You may use an IDE (BlueJ, Netbeans, etc) or just an editor and command line operations (javac, java) in Unix or Windows/DOS to develop your program. Use good design (dont put everything in one...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

Journal Article Review 1. Write Title that reflects the main focus 2. Cite the article 3. Article Identification 4. Introduction 5. Summarize the Article 6. Critique 7. Conclusion The interaction...

A Journal Article Review for " The interaction between technology, business environment, society, and regulation in ICT industries". 1. Write the Title that reflects the main focus of your work. ......

Topic: Conducting personal job interviews using the star model 1-Design a two-hour training work plan for 10 trainees 2-Determine the quality of trainees 3-Use the training design model Formulate one...

I hope you can answer this question and find the reference below the question. Thank you Topic: Conducting personal job interviews using the STAR model 1- Design a two-hour training work plan for 10...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Chapter 2 User-Centered Systems Design: A Brief History Abstract The intention of this book is to help you think about design from a user-centered perspective. Our aim is to help you understand what...

Butler Corporation has provided the following information pertaining to the year ended December 31, 2020 Accounts Payable $16,700 Interest Expense $2,500 Accounts Receivable $24,000 Inventory $15,200...

A manufactured product has the following information for June. Standard 7 lbs. @ $7 per lb. 2 hrs. @ $15 per hr. hrs. @ $12 per hr. Direct materials Direct labor Overhead Units manufactured (1)...

Comment on the usefulness of investing ( considering the answers in part 5 , 6 ) in such ORC comparing efficiency values.

Susan Company accumulates the following data concerning raw materials in making its finished product. (1) Price per pound of raw materials is net purchase price $2.90, freight-in $0.40, and receiving...