The file UniversalBank.csv contains data on 5000 customers of Universal Bank. The data include customer demographic information

Question:

The file UniversalBank.csv contains data on 5000 customers of Universal Bank. The data include customer demographic information (age, income, etc.), the customer’s relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (=9.6%) accepted the personal loan that was offered to them in the earlier campaign. In this exercise, we focus on two predictors: Online (whether or not the customer is an active user of online banking services) and Credit Card (abbreviated CC below) (does the customer hold a credit card issued by the bank) and the class label Personal Loan (abbreviated Loan below).

Partition the data into training (60%) and holdout (40%) sets. 

a. Create a pivot table for the training data with Online as a column grouping attribute, CC as a row attribute (i.e., group by attribute), and Loan as a secondary row attribute (i.e., group by attribute). The values inside the table should convey the count. Consider using the Turbo Prep view for building the pivot table.

b. Consider the task of classifying a customer who owns a bank credit card and is actively using online banking services. Looking at the pivot table, what is the probability that this customer will accept the loan offer? [This is the probability of loan acceptance (Loan = true) conditional on having a bank credit card (CC = true) and being an active user of online banking services (Online = true).

c. Create two separate pivot tables for the training data. One will have Loan (rows) as a function of Online (columns), and the other will have Loan (rows) as a function of CC.

d. Compute the following quantities [P(A | B) means “the probability of A given B”:

i. P(CC = true | Loan = true) (the proportion of credit card holders among the loan acceptors)

ii. P(Online = true | Loan = true)

iii. P(Loan = true) (the proportion of loan acceptors)

iv. P(CC = true | Loan = false)

v. P(Online = true | Loan = false)

vi. P(Loan = false)

e. Use the quantities computed above to compute the naive Bayes probability P(Loan = true | CC = true, Online = true).

f. Compare this value with the one obtained from the pivot table in (b). Which is a more accurate estimate?

g. In RapidMiner, run naive Bayes on the data. Examine the model output on the training data, and find an entry that corresponds to P(Loan = true | CC = true, Online = true). Compare this with the number you obtained in (e).

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Machine Learning For Business Analytics

ISBN: 9781119828792

1st Edition

Authors: Galit Shmueli, Peter C. Bruce, Amit V. Deokar, Nitin R. Patel

Question Posted: