Telecommunications companies providing cell phone service are interested in customer retention. In particular, identifying customers who are about to churn (cancel their service) is potentially worth millions of dollars if the company can proactively address the reason that customer is considering cancellation and retain the customer. The WEBfile Cellphone contains customer data to be used to classify a customer as a churner or not.
In XLMiner's Partition with Oversampling procedure, partition the data so there is 50 percent successes (churners) in the training set and 40 percent of the validation data is taken away as test data. Classify the data using k-nearest neighbors with up to k = 20.
Use Churn as the output variable and all the other variables as input variables. In Step 2 of XLMiner's k-nearest neighbors Classification procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data.
a. Why is partitioning with oversampling advised in this case?
b. For the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data?
c. What is the overall error rate on the test data?
d. What are the Class 1 error rate and the Class 0 error rate on the test data?
e. Compute and interpret the sensitivity and specificity for the test data.
f. How many false positives and false negatives did the model commit on the test data? What percentage of predicted churners were false positives? What percentage of predicted nonchurners were false negatives?
g. Examine the decile-wise lift chart on the test data. What is the first decile lift on the test data? Interpret this value.