Question: For data mining In python or R please 1. Using the churn data set, determine how many records need to be resampled in order to
For data mining In python or R please
1. Using the churn data set, determine how many records need to be resampled in order to have 20% of the rebalanced data set have true Churn variable values. Create the rebalanced data set and confirm that the new data set has 20% true Churn variable values.
2. Partition the rebalanced data set so that 67% of the records are included in the training data set and 33% are included in the test data set. Use a bar graph to confirm the proportions. Validate the training and test data sets by testing for the difference in the training and test means using Day.Mins (t-test) and the Z-test on the Churn variable. Try forming new training and test sets if there is enough evidence to reject the null hypothesis.
3. Create a CART model using the training set with the Churn target variable and whatever predictor variables you think appropriate. Try at least 3 different models. Compare the confusion tables and accuracies of the 3 different models.
4. Create a C5.0 model using the training set with the Churn target variable and whatever predictor variables you think appropriate. Try at least 3 different models. Compare the confusion tables and accuracies of the 3 different models. How does C5.0 compare in performance to CART?
Step by Step Solution
3.39 Rating (149 Votes )
There are 3 Steps involved in it
1 In order to have 20 of the rebalanced data set have true Churn variable values we would need to re... View full answer
Get step-by-step solutions from verified subject matter experts
