Question: For data mining In python or R please 1. Using the churn data set, determine how many records need to be resampled in order to

For data mining In python or R please

1. Using the churn data set, determine how many records need to be resampled in order to have 20% of the rebalanced data set have true Churn variable values. Create the rebalanced data set and confirm that the new data set has 20% true Churn variable values.

2. Partition the rebalanced data set so that 67% of the records are included in the training data set and 33% are included in the test data set. Use a bar graph to confirm the proportions. Validate the training and test data sets by testing for the difference in the training and test means using Day.Mins (t-test) and the Z-test on the Churn variable. Try forming new training and test sets if there is enough evidence to reject the null hypothesis.


3. Create a CART model using the training set with the Churn target variable and whatever predictor variables you think appropriate. Try at least 3 different models. Compare the confusion tables and accuracies of the 3 different models.

4. Create a C5.0 model using the training set with the Churn target variable and whatever predictor variables you think appropriate. Try at least 3 different models. Compare the confusion tables and accuracies of the 3 different models. How does C5.0 compare in performance to CART?

Step by Step Solution

3.39 Rating (149 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

1 In order to have 20 of the rebalanced data set have true Churn variable values we would need to re... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!