Refer to the scenario described in Problem 10 and the file BlueOrRed. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Use logistic regression to classify observations as undecided (or decided) using Age, HomeOwner, Female, Married, HouseholdSize, Income, and Education as input variables and Undecided as the output variable. Perform an exhaustive-search best subset selection with the number of subsets equal to 2.
a. From the generated set of logistic regression models, select one that you believe is a good fit. Express the model as a mathematical equation relating the output variable to the input variables.
b. Increases in which variables increase the chance of a voter being undecided? Increases in which variables decrease the chance of a voter being decided?
c. Using the default cutoff value of 0.5 for your logistic regression model, what is the overall error rate on the test data?
d. Examine the decile-wise lift chart for your model on the test data. What is the first decile lift? Interpret this value.

  • CreatedNovember 21, 2015
  • Files Included
Post your question