For data mining In python or R please 1. Using the churn data set, determine how many

Fantastic news! We've Found the answer you've been seeking!

Question:

For data mining In python or R please

1. Using the churn data set, determine how many records need to be resampled in order to have 20% of the rebalanced data set have true Churn variable values. Create the rebalanced data set and confirm that the new data set has 20% true Churn variable values.

2. Partition the rebalanced data set so that 67% of the records are included in the training data set and 33% are included in the test data set. Use a bar graph to confirm the proportions. Validate the training and test data sets by testing for the difference in the training and test means using Day.Mins (t-test) and the Z-test on the Churn variable. Try forming new training and test sets if there is enough evidence to reject the null hypothesis.

3. Create a CART model using the training set with the Churn target variable and whatever predictor variables you think appropriate. Try at least 3 different models. Compare the confusion tables and accuracies of the 3 different models.

4. Create a C5.0 model using the training set with the Churn target variable and whatever predictor variables you think appropriate. Try at least 3 different models. Compare the confusion tables and accuracies of the 3 different models. How does C5.0 compare in performance to CART?