Universal Bank has begun a program to encourage its existing customers to borrow via a consumer loan

Question:

Universal Bank has begun a program to encourage its existing customers to borrow via a consumer loan program. The bank has promoted the loan to 5000 customers, of whom 480 accepted the offer. The data are available in file UniversalBank.xls. The bank now wants to develop a model to predict which customers have the greatest probability of accepting the loan, to reduce promotion costs and send the offer only to a subset of its customers.

We will develop several models, then combine them in an ensemble. The models we will use are (1) logistic regression, (2) k-nearest-neighbors with \(k=3\), and (3) naive Bayes. Pre-process the data as follows:

- Bin the following variables so they can be used in naive Bayes: Age, Experience, Income, CC Average, and Mortgage. Set the \# of bins to 20, use "equal count per bin," and show the bin mean in the resulting binned variable.

- Education and Family can be used as is, without binning.

- Zip code can be ignored.

- Partition the data: \(60 \%\) training, \(40 \%\) validation.

a. Fit models to the data for (1) logistic regression, (2) \(k\)-nearest-neighbors with \(k=3\), and (3) Naive Bayes. Use Personal Loan as the target variable. Report the confusion matrix for each of the three models.

b. In a new worksheet, copy from each of the three model outputs the columns for actual outcome, predicted outcome, and probability. Report the first 10 rows of these columns.

c. Add two columns to this worksheet for (1) a majority vote of predicted outcomes, and (2) the average of the predicted probabilities. Using the classifications generated by these two methods derive a confusion matrix for each method and note the overall error rate.

d. Compare the error rates for the three individual methods and the two ensemble methods.

Fantastic news! We've Found the answer you've been seeking!