For data mining In python or R please 1. Using the churn data set, determine how many
Question:
For data mining In python or R please
1. Using the churn data set, determine how many records need to be resampled in order to have 20% of the rebalanced data set have true Churn variable values. Create the rebalanced data set and confirm that the new data set has 20% true Churn variable values.
2. Partition the rebalanced data set so that 67% of the records are included in the training data set and 33% are included in the test data set. Use a bar graph to confirm the proportions. Validate the training and test data sets by testing for the difference in the training and test means using Day.Mins (t-test) and the Z-test on the Churn variable. Try forming new training and test sets if there is enough evidence to reject the null hypothesis.
3. Create a CART model using the training set with the Churn target variable and whatever predictor variables you think appropriate. Try at least 3 different models. Compare the confusion tables and accuracies of the 3 different models.
4. Create a C5.0 model using the training set with the Churn target variable and whatever predictor variables you think appropriate. Try at least 3 different models. Compare the confusion tables and accuracies of the 3 different models. How does C5.0 compare in performance to CART?
Statistics The Art and Science of Learning from Data
ISBN: 978-0321755940
3rd edition
Authors: Alan Agresti, Christine A. Franklin