# Question

A consumer advocacy agency, Equitable Ernest, is interested in providing a service in which an individual can estimate their own credit score (a continuous measure used by banks, insurance companies, and other businesses when granting loans, quoting premiums, and issuing credit). The file CreditScore contains data on an individual's credit score and other variables.

Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the individuals' credit scores using k-nearest neighbors with up to k = 20. Use CreditScore as the output variable and all the other variables as input variables. In Step 2 of XLMiner's k-Nearest Neighbors Prediction procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate a Detailed Scoring for all three sets of data.

a. For k = 1, why is the root mean squared error greater than zero on the training set? Why would we expect the root mean squared error to be zero for k = 1 on the training set?

b. What value of k minimizes the root mean squared error (RMSE) on the validation data?

c. How does the RMSE on the test set compare to the RMSE on the validation set?

d. What is the average error on the test set? Analyze the output in the KNNP_TestScore1 worksheet, paying particular attention to the observations in which had the largest over prediction (large negative residuals) and the largest underprediction (large positive residuals). Explain what may be contributing to the inaccurate predictions and possible ways to improve the k-NN approach.

Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the individuals' credit scores using k-nearest neighbors with up to k = 20. Use CreditScore as the output variable and all the other variables as input variables. In Step 2 of XLMiner's k-Nearest Neighbors Prediction procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate a Detailed Scoring for all three sets of data.

a. For k = 1, why is the root mean squared error greater than zero on the training set? Why would we expect the root mean squared error to be zero for k = 1 on the training set?

b. What value of k minimizes the root mean squared error (RMSE) on the validation data?

c. How does the RMSE on the test set compare to the RMSE on the validation set?

d. What is the average error on the test set? Analyze the output in the KNNP_TestScore1 worksheet, paying particular attention to the observations in which had the largest over prediction (large negative residuals) and the largest underprediction (large positive residuals). Explain what may be contributing to the inaccurate predictions and possible ways to improve the k-NN approach.

## Answer to relevant Questions

Refer to the scenario described in Problem 16 and the file CreditScore. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the individuals' credit scores using a ...Refer to the scenario described in Problem 19 and the file HousingBubble. a. Consider the Pre-Crisis worksheet data. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict ...A grocery store introducing items from Italy is interested in analyzing buying trends of new international items: prosciutto, pepperoni, risotto, and gelato. a. Using a minimum support of 100 transactions and a minimum ...The Camera Shop sells two popular models of digital cameras. The sales of these products are not independent; if the price of one increases, the sales of the other increases. In economics, these two camera models are called ...Through a series of Web-based experiments, Eastman has created a predictive model that estimates demand as a function of price. The predictive model is demand = 4000 - 6p where p is the price of the e-book. a. Update your ...Post your question

0