Part B: Use the Simmons data set in module 10. See the Excel file titled Simmons-data-raw...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Part B: Use the Simmons data set in module 10. See the Excel file titled Simmons-data-raw in Module 10. Watch the video that explains the contents of the file. The data set uses two predictors X1 = Annual spend on a similar credit card and X2 = Presence/Absence of the Simmons loyalty card to PREDICT Y = Will customer use coupon or not? For Part 2, build a logistic regression model to predict Y = coupon usage from X1 and X2 and then answer the following questions. PartB-1 (2 points): What are the coefficents (BETAs) for the logistic regression model? Answer as below: LR coefficents BETAO (or constant term) BETA1 (coeff. For X1) BETA2 (coeff. For X2) Value PartB-2 (2 points): Use the model above to compare TWO customers Jack and Jill. Jack spends $2000 annually (note: X1 for Jack = 2) and HAS the Simmons card (X2 = 1). Jill spends $4000 annually (X1 = 4) and does NOT have the Simmons card (X2 = 0). Who is more likely to use the coupon? (Hint: A complete answer must evaluate their probabilities for response). Probability of Response Jack Jill XXXX is more likely to respond because... PartB-3 (1 point): If you were to ROLL OUT the logistic regression model to PREDICT coupon usage for a LARGE database of customers, what CUTOFF probability will you choose? (Hint: No right or wrong answer here, but a concept such as a CONFUSION MATRIX may help make your call for cutoff probability). The Rules are as stated by the decision tree. WHICH CHURN SEGMENT DO YOU RECOMMEND FOCUSING ON? WHY? Finally, in appendices for PART A, place Jupyter notebooks Parts A.2 and A.3. Appendix A.2: Decision Tree cross validation notebook: construct this notebook by combining ideas from Churn_Telco and Iris practice_crossval notebooks. Appendix A.3: Logistic regression cross validation notebook: construct this notebook by combining ideas from Churn_Telco, WBCD and Iris_practice_crossval notebooks. NOTE: The data set used for logistic regression is STILL the Telco churn dataset. Page < 3 of 5 C | ZOOM + Hints for writing a good report for Part A: The report for PART A does not have to be long (no more than 5 pages), but should be super-clear. Imagine you are presenting the report to senior management. Here are some suggestions: Part A.1: Describe the numbers below in a table: Decision Tree Cross-validation Fold Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 Average Error % Std. Dev. Error % See IRIS PRACTICE CROSSVAL Jupyter notebook in Module 6 for how to create a 10-fold cross-validation (that notebook shows you a 5-fold). Next place Part A.4 (not parts A.2, A.3): Y N Logistic Regression Р $ Benefit $ cost Churn segment 1... Churn segment 2... n $ cost $ benefit Write a few sentences supporting the $benefit/cost numbers. How did you come up with these numbers? Next place Part A.5 (not parts A.2, A.3): RULES for identifying Description in words Page 2 of 5 C | ZOOM + DATA SCIENCE: Machine Learning TEAM PROJECT This programming assignment contains 2 independent parts, creatively called Part A and Part B Ⓒ) PART A (telco churn analysis): (10 points) For this part of the project, you will analyze the TelCo CHURN data set. Divide the entire data set into training and test sets (the test set should be 25% of the original data set). PART A Deliverable: Apply 10-fold cross-validation to build TWO distinct models to predict customer CHURN. The two techniques are a) Decision Trees and b) Logistic Regression. Use the best combination of predictor variables for this purpose (ok to use the variables in the Lab notebook for Decision trees). There are 5 parts to PART A of your submission: A.1: For both Decision Trees and Logistic Regression, report the accuracy for the 10 fold Also, compute the VERAGE accuracy across folds as well as the STANDARD DEVIATION of accuracy across the 10 folds. Which technique (LR or Trees) has a higher average accuracy? (2 points) A.2: Accurate Jupyter notebook pdf in Appendix A.2 of your Decision Tree cross validation code (2 points) A.3: Accurate Jupyter notebook pdf in Appendix A.3 of your Logistic Regression cross validation code (2 points) A.4: Consider the 4 cells (p, Y), (p, N), (n, Y) and (n, N) (see chapter on confusion matrix from Provost book). For each of these cells come up with a BENEFIT/COST for every customer that falls into the cell. There is no right or wrong answer here, but this has NOTHING to do with parts A.1,A.2,A.3 above. This is based upon a BUSINESS understanding of the costs/benefits of misclassification. State your rationale for the numbers you provide (2 points). A.5: Look carefully at ALL the predicted "CHURN/LEAVE" node-leafs of your decision tree. As a business manager, describe each churning segment in words. Recommend ONE choice of CHURN segment where you will focus your resources to reduce churn. Why did you pick this one segment from all the available alternatives? (2 points) TOTAL for PART A: 10 points. 1 Page 1 of 5 C | ZOOM + Part B: Use the Simmons data set in module 10. See the Excel file titled Simmons-data-raw in Module 10. Watch the video that explains the contents of the file. The data set uses two predictors X1 = Annual spend on a similar credit card and X2 = Presence/Absence of the Simmons loyalty card to PREDICT Y = Will customer use coupon or not? For Part 2, build a logistic regression model to predict Y = coupon usage from X1 and X2 and then answer the following questions. PartB-1 (2 points): What are the coefficents (BETAs) for the logistic regression model? Answer as below: LR coefficents BETAO (or constant term) BETA1 (coeff. For X1) BETA2 (coeff. For X2) Value PartB-2 (2 points): Use the model above to compare TWO customers Jack and Jill. Jack spends $2000 annually (note: X1 for Jack = 2) and HAS the Simmons card (X2 = 1). Jill spends $4000 annually (X1 = 4) and does NOT have the Simmons card (X2 = 0). Who is more likely to use the coupon? (Hint: A complete answer must evaluate their probabilities for response). Probability of Response Jack Jill XXXX is more likely to respond because... PartB-3 (1 point): If you were to ROLL OUT the logistic regression model to PREDICT coupon usage for a LARGE database of customers, what CUTOFF probability will you choose? (Hint: No right or wrong answer here, but a concept such as a CONFUSION MATRIX may help make your call for cutoff probability). The Rules are as stated by the decision tree. WHICH CHURN SEGMENT DO YOU RECOMMEND FOCUSING ON? WHY? Finally, in appendices for PART A, place Jupyter notebooks Parts A.2 and A.3. Appendix A.2: Decision Tree cross validation notebook: construct this notebook by combining ideas from Churn_Telco and Iris practice_crossval notebooks. Appendix A.3: Logistic regression cross validation notebook: construct this notebook by combining ideas from Churn_Telco, WBCD and Iris_practice_crossval notebooks. NOTE: The data set used for logistic regression is STILL the Telco churn dataset. Page < 3 of 5 C | ZOOM + Hints for writing a good report for Part A: The report for PART A does not have to be long (no more than 5 pages), but should be super-clear. Imagine you are presenting the report to senior management. Here are some suggestions: Part A.1: Describe the numbers below in a table: Decision Tree Cross-validation Fold Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 Average Error % Std. Dev. Error % See IRIS PRACTICE CROSSVAL Jupyter notebook in Module 6 for how to create a 10-fold cross-validation (that notebook shows you a 5-fold). Next place Part A.4 (not parts A.2, A.3): Y N Logistic Regression Р $ Benefit $ cost Churn segment 1... Churn segment 2... n $ cost $ benefit Write a few sentences supporting the $benefit/cost numbers. How did you come up with these numbers? Next place Part A.5 (not parts A.2, A.3): RULES for identifying Description in words Page 2 of 5 C | ZOOM + DATA SCIENCE: Machine Learning TEAM PROJECT This programming assignment contains 2 independent parts, creatively called Part A and Part B Ⓒ) PART A (telco churn analysis): (10 points) For this part of the project, you will analyze the TelCo CHURN data set. Divide the entire data set into training and test sets (the test set should be 25% of the original data set). PART A Deliverable: Apply 10-fold cross-validation to build TWO distinct models to predict customer CHURN. The two techniques are a) Decision Trees and b) Logistic Regression. Use the best combination of predictor variables for this purpose (ok to use the variables in the Lab notebook for Decision trees). There are 5 parts to PART A of your submission: A.1: For both Decision Trees and Logistic Regression, report the accuracy for the 10 fold Also, compute the VERAGE accuracy across folds as well as the STANDARD DEVIATION of accuracy across the 10 folds. Which technique (LR or Trees) has a higher average accuracy? (2 points) A.2: Accurate Jupyter notebook pdf in Appendix A.2 of your Decision Tree cross validation code (2 points) A.3: Accurate Jupyter notebook pdf in Appendix A.3 of your Logistic Regression cross validation code (2 points) A.4: Consider the 4 cells (p, Y), (p, N), (n, Y) and (n, N) (see chapter on confusion matrix from Provost book). For each of these cells come up with a BENEFIT/COST for every customer that falls into the cell. There is no right or wrong answer here, but this has NOTHING to do with parts A.1,A.2,A.3 above. This is based upon a BUSINESS understanding of the costs/benefits of misclassification. State your rationale for the numbers you provide (2 points). A.5: Look carefully at ALL the predicted "CHURN/LEAVE" node-leafs of your decision tree. As a business manager, describe each churning segment in words. Recommend ONE choice of CHURN segment where you will focus your resources to reduce churn. Why did you pick this one segment from all the available alternatives? (2 points) TOTAL for PART A: 10 points. 1 Page 1 of 5 C | ZOOM +
Expert Answer:
Answer rating: 100% (QA)
R Code The following R code should produce the same results data1Y... View the full answer
Related Book For
Income Tax Fundamentals 2013
ISBN: 9781285586618
31st Edition
Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill
Posted Date:
Students also viewed these databases questions
-
Superior Company provided the following data for the year ended December 31 (all raw materials are used in production as direct materials): Selling expenses Purchases of raw materials Direct labor...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Answer the following questions: 1. Build the management-research question hierarchy for Starbuck project 2. the Duetto Card team turned to Green field Online to recruit a panel for one of its online...
-
Under Social Security, the family of a worker who dies while fully insured at the time of death has a right to survivors' benefits. True False
-
A 3-m3 rigid tank contains hydrogen at 250 kPa and 550 K. The gas is now cooled until its temperature drops to 350 K. Determine (a) The final pressure in the tank and (b) The amount of heat transfer.
-
If v is a vector with initial point (x 1 , y 1 ) and terminal point (x 2 , y 2 ), then which of the following is the position vector that equals v? (a)(xz - X1, Y2 - Yi (c) 2017.12) (b) (X1 - X2, Y1...
-
Take the product of the perturbation velocity equation given by Eq. (17.53) by any component of the perturbation velocity. This results in an equation for \(v_{x}^{\prime} v_{y}^{\prime}\), which is...
-
Pet Toys, Inc., expected to sell one plush toy for each two chew toys sold. Planned sales and variable costs for last year were as follows: During the year, a competitor came out with a similar plush...
-
Solve the following compound interest problems. a) You invest $30,000 in an account that earns 4% APR compounded monthly. How long will it take for the account to increase by 50%? b) You want to...
-
1. What types of control are important at Dollar General? Why are these important? 2. What technological innovations did Kathleen Guion introduce at Dollar General? How did these innovations support...
-
The horizontal axis is labeled quantity of output in units. The numbers marked on the horizontal axis from left to right are 10, 20, 35, 45, and 50. The vertical axis is labeled dollars. The numbers...
-
Think about organizations that you have been involved in either in the past or currently. Can you remember their slogan? Why would remembering it be valuable for organizations?
-
In two weeks, client will use communication management strategies to manage Stephanie will consistently apply communication management strategies to navigate social interactions and relationships,...
-
Let's say you have obtained information from an interview, a letter, email, or other person-to-person communication. How should this be cited?
-
This is your first experience with public speaking and you're very nervous. You're afraid you'll forget your speech and stumble. So you're wondering if it would be a good idea to alert your audience...
-
how would a kantian critique the message in "An African message for America" that we should help locally ratehr than internationally ?
-
Required annuity payments Retirement income today $60,000 Years to retirement 10 Years of retirement 25 Inflation rate 5.00% Savings $100,000 Rate of return 8.00% Calculate value of savings in 10...
-
A condenser (heat exchanger) brings 1 kg/s water flow at 10 kPa quality 95% to saturated liquid at 10 kPa, as shown in Fig. P4.91. The cooling is done by lake water at 20C that returns to the lake at...
-
Jenny earns $34,500 in 2012. Calculate the FICA tax that must be paid by: Jenny: ..............................Soc,Sec. ..................$______________...
-
Deborah purchases a new $30,000 car in 2012 to use exclusively in her business. If Deborah does not elect to expense or take bonus depreciation in 2012 and holds the car until it is fully...
-
If Charles, a 16-year-old child model, earns $50,000 a year and is completely self supporting even though he lives with his parents, can his parents claim him as a dependent? Why or why not?...
-
The data file \(b r 2\) contains data on 1080 house sales in Baton Rouge, Louisiana, during July and August 2005. The variables are: PRICE (\\($)\), SQFT (total square feet), BEDROOMS (number), BATHS...
-
Explain how and why plots of least squares residuals can reveal heteroskedasticity.
-
How much of an incumbency advantage do winners in U.S. Senate elections enjoy? This issue is examined by Matias D. Cattaneo, Brigham R. Frandsen and Roco Titiunik (2015) "Randomization Inference in...
Study smarter with the SolutionInn App