Question: The dataset UniversalBank.csv below contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage,

The dataset UniversalBank.csv below contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (PersonalLoan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.

Partition the dataset into 60% training and 40% validation sets considering the information on the following customer:

Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg = 2, Education_1 = 0, Education_2 = 1,Education_3 = 0, Mortgage = 0, Securities Account = 0, CD Account = 0, Online = 1, and Credit Card=1

Second part of the problem

Consider the following customer:

Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg = 2, Education_1 = 0, Education_2 = 1,Education_3 = 0, Mortgage = 0, Securities Account = 0, CD Account = 0, Online = 1 and Credit Card= 1.

Classify the above customer using the best k.

Repartition the data, this time into training, validation, and test sets (50% : 30% : 20%).

Apply the k-NN method with the k chosen above.

Compare the confusion matrix of the test set with that of the training and validation sets.

Comment on the differences and their reason

dataset and my current codes (some are not working)

dataset- https://github.com/MyGitHub2120/Personal-Loan-Acceptance

Here are my codes

library("dplyr")

library("tidyr")

library("ggplot2")

library("rpart")

library("rpart.plot")

library("caret")

library("randomForest")

library("tidyverse")

library("glmnet")

library("Hmisc")

library("dummies")

library('tinytex')

library('GGally')

library('gplots')

library("dplyr")

library("tidyr")

library("caTools")

library("reshape")

df<-read_csv("C:/Users/andyt/OneDrive/Desktop/UniversalBank.csv")

View(UniversalBank)

bank<-df

names(bank)

bank$Education <- as.factor(bank$Education)

bank_dummy<-dummy.data.frame(select(bank,-c(Zip.Code,ID))) Could not categorize the variable 'Zip Code' Need to resolve this issue for the next code

bank_dummy$Personal.Loan = as.factor(bank_dummy$Personal.Loan)

bank_dummy$CCAvg = as.integer(bank_dummy$CCAvg)

set.seed(1)

train.index <- sample(row.names(bank_dummy), 0.6*dim(bank_dummy)[1])## need to look at hints

test.index <- setdiff(row.names(bank_dummy), train.index)

train.df <- bank_dummy[train.index, ]

valid.df <- bank_dummy[test.index, ]

new.df = data.frame(Age = as.integer(40), Experience = as.integer(10), Income = as.integer(84), Family = as.integer(2), CCAvg = as.integer(2), Education1 = as.integer(0), Education2 = as.integer(1), Education3 = as.integer(0), Mortgage = as.integer(0), Securities.Account = as.integer(0), CD.Account = as.integer(0), Online = as.integer(1), CreditCard = as.integer(1))

norm.values <- preProcess(train.df[, -c(10)], method=c("center", "scale"))

train.df[, -c(10)] <- predict(norm.values, train.df[, -c(10)])

valid.df[, -c(10)] <- predict(norm.values, valid.df[, -c(10)])

new.df <- predict(norm.values, new.df)

knn.1 <- knn(train = train.df[,-c(10)],test = new.df, cl = train.df[,10], k=5, prob=TRUE)

knn.attributes <- attributes(knn.1)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

What is the PV+ of the test? Suppose the self-reports are completely accurate and are representative of the number of eighth-grade students who smoke in the general community. We are considering...

Universal Bank is a relatively young bank growing rapidly in terms of overall customer acquisition. The majority of these customers are liability customers (depositors) with varying sizes of...

Personal Loan Acceptance. Universal Bank is a relatively young bank growing rapidly in terms of overall customer acquisition. The majority of these customers are liability customers (depositors) with...

Universal Bank is a relatively young bank growing rapidly in terms of overall customer acquisition. The majority of these customers are liability customers (depositors) with varying sizes of...

The dataset UniversalBank.csv below contains data on 5000 customers. The data include customer demographics information (age, income, etc.), the customer's relationship with the bank (mortgage,...

A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over9% success. This has encouraged the retail marketing department to devise smarter campaigns with...

Here is the question and following with data called Bank.csv Using Python Topic - KNN and NBC KNN problems Relatively young bank growing rapidly in terms of overall customer acquisition. The majority...

Each of the following phrases (1. through 5.) is from a paragraph in an auditor's report. Assume that except for the information indicated in the phrase, the report would have been a standard...

Determine what project of the five potential ones to bid on in order to maximize the profit. The payoff matrix is illustrated as follows (in thousands dollars). The probabilities for the possible...

explain guide and motivate individuals or groups in achieving a common goal

Data: t/C 20 30 40 50 V NAOH / mL 13.1 16.0 21.8 26.2 *Volume of sample solution for titration: 25 mL, density of water/benzoic acid solution: 1.00 g/mL Calculations: I. Calculate the solubility, C,...

Watch Video-2b where P3 mL of 0.15 M CoCl2 (in ethanol) is placed into a 13100 mm test t (Ethanol is used as a solvent so that you can observe the effect of adding water to this equilibri Using a...

The standard cell potential (E cell ) of the reaction below is +1.34 V. The value of G for the reaction is ________ kJ/mol. 3 Cu (s) + 2 MnO 4 - (aq) + 8H + (aq) 3 Cu 2+ (aq) + 2 MnO 2 (s) + 4 H 2 O...

Question 1 Alum (Alz (SO4)3 14H2O) is being used in a coagulation process to form precipitate for treating water. A stock solution of alum is prepared on-site with a concentration of 3.21 mol/L [AP]....

3 "In Jidoka, when an abnormality is detected, workers will stop the production line, investigate the root causes and install countermeasures." The practice of Jidoka mainly eliminates which of the...

1. Too understand personal motivation.

3. Imagine three people you know relatively well. Consider which of the five positions they often use in the conversations they engage in with other people.

4. Select a person you want to model. Choose what you want to model in this person, such as relationship skills. a. Observe what he does and how he does this in relationships with others. b. Try to...