Question: Complete the following steps and then answer the exercise questions below. Step 1. Import the training and scoring data sets for this exercise into data
Complete the following steps and then answer the exercise questions below.
Step 1. Import the training and scoring data sets for this exercise into data frames in RStudio.
Step 2. Load the R library required to create a logistic regression model.
Step 3. Make a logistic regression model to predict RenewedSubscription. Do not include PatronID as an independent variable. Coerce the dependent variable to be treated as a factor. Use the summary() function to inspect your independent variables' p-Values. Do not remove any independent variables from the model.
Step 4. Using a subset() command, remove observations, if any, from the scoring data set where one or more attributes exceed the range established in the training data set. For example, the range for DifferentUsers in the training data set is 2 to 12. If any observations in the scoring data set have DifferentUsers values below 2 or above 12, remove them. Check all attributes to ensure all scoring observations are within ranges established by the training data.
Step 5. Using the predict() function, apply your logistic regression model to the scoring data. Make sure the type of prediction you generate is the model's "response."
Step 6. Combine the predictions with the scoring data into a new data frame. View the data frame and answer the following questions.
Which attribute is the single poorest predictor of season ticket renewal?
- AvgMinutesBeforeCurtain
- PerformancesAttended
- PricePerTicket
- ConcessionVouchers
Of the first-year season ticket patrons in the scoring data set, how many are typically late to the performances they attend?
- 16
- 8
- 51
- 3
How many observations had to be removed from the scoring data set because one or more independent variable values exceeded the range established by the training data?
- 0
- 13
- 148
- 1
How many "No" predictions have a post-probability confidence percent higher than 95%?
- 80
- 68
- 9
- 71
If you wished to test the accuracy of your logistic regression model in R, to which data set would you apply the predict() function?
- The training data
- The test data
- The validation data
- The scoring data
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
