Make a Standard Partition of the data into Training, Validation, and Test sets. Select all the 9
Question:
Make a Standard Partition of the data into Training, Validation, and Test sets. Select all the 9 variables to be in the partition, use 12345 as the seed in the randomized sampling, and specify 50% of observations in the training set, 30% in the validation set, and 20% in the test set.
Predict the individuals’ credit scores using k-Nearest Neighbors with up to k = 20. Use CreditScore as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s k-Nearest Neighbors Prediction procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Select Summary Report for Score Training Data, and Score Validation Data. Select Detailed Report, Summary Report, and Lift Charts for Score Test Data.
Based on the results from XLMiner, answer the following questions.
- What is the best k chosen? What does it mean?
- Compare the RMSE on the test set to the RMSE on the validation set. Please comment.
- What is the average error on the test set? What does it suggest?
- Predict the CreditScore for two individuals with the following information, using the best k:
BureauInquiries | CreditUsage | TotalCredit | CollectionReports | MissedPayments | HomeOwner | CreditAge | TimeOnJob |
2 | 0.5 | 14,000 | 1 | 2 | 0 | 5 | 3 |
3 | 0.2 | 25,000 | 0 | 0 | 1 | 7 | 8 |
Hint: For your convenience, this table is stored in the worksheet NewData. After the k-Nearest Neighbors prediction procedure is completed, select the worksheet NewData, and click any cell in the range of the data. Click the XLMINER PLATFORM tab on the ribbon. Click Score in the Tools group. In the Data to be Scored area, confirm that the Worksheet is NewData, and the box for First Row Contains Headers is checked. You should see the same variables appear in both the Variables in New Data and Model Variables areas. Click Match By Name. Click OK. A new worksheet KNNP_ModelScore is then generated. Report the predicted values of CreditScore (rounded to the nearest integers) for the two individuals.
Essentials of Business Analytics
ISBN: 978-1285187273
1st edition
Authors: Jeffrey Camm, James Cochran, Michael Fry, Jeffrey Ohlmann, David Anderson, Dennis Sweeney, Thomas Williams