A supermarket is offering a new line of organic products. The supermarket's management wants to determine which
Question:
A supermarket is offering a new line of organic products. The supermarket's management wants to determine which customers are likely to purchase these products.
The supermarket has a customer loyalty program. As an initial buyer incentive plan, the supermarket provided coupons for the organic products to all of the loyalty program participants and collected data that includes whether these customers purchased any of the organic products.
The ORGANICS data set (available on Canvas) contains 13 variables and over 22,000 observations. The variables in the data set are shown below with the appropriate roles and levels:
Name | Model | Measurement | Description |
ID | ID | Nominal | Customer loyalty identification number |
DemAffl | Input | Interval | Affluence grade on a scale from 1 to 30 |
DemAge | Input | Interval | Age, in years |
DemCluster | Rejected | Nominal | Type of residential neighborhood |
DemClusterGroup | Input | Nominal | Neighborhood group |
DemGender | Input | Nominal | M = male, F = female, U = unknown |
DemRegion | Input | Nominal | Geographic region |
DemTVReg | Input | Nominal | Television region |
PromClass | Input | Nominal | Loyalty status: tin, silver, gold, or platinum |
PromSpend | Input | Interval | Total amount spent |
PromTime | Input | Interval | Time as loyalty card member |
TargetBuy | Target | Binary | Organics purchased? 1 = Yes, 0 = No |
TargetAmt | Rejected | Interval | Number of organic products purchased |
! Although two target variables are listed, these exercises concentrate on the binary variable TargetBuy.
a. Create a new diagram named Organics.
b. Define the data set ORGANICS as a data source for the project.
1) Set the model roles for the analysis variables as shown above.
2) The variable DemClusterGroup contains collapsed levels of the variable DemCluster. Presume that, based on previous experience, you believe that DemClusterGroup is sufficient for this type of modeling effort. Set the model role for DemCluster to Rejected.
3) As noted above, only TargetBuy will be used for this analysis and should have a role
of Target. Can TargetAmt be used as an input for a model used to predict TargetBuy?
Why or why not? Put your answer in your report. Similarly, answering all following questions in your report.
4) Finish the Organics data source definition.
5) Examine the distribution of the target variable. What is the proportion of individuals who purchased organic products?
c. Add the ORGANICS data source to the Organics diagram workspace.
d. Add a Data Partition node to the diagram and connect it to the Data Source node. Assign 50% of the data for training and 50% for validation.
e. Add a Decision Tree node to the workspace and connect it to the Data Partition node.
f. Create a decision tree model autonomously. Use misclassification error rate as the model assessment statistic. Include the resulting tree in your report.
1) How many leaves are in the optimal tree?
2) Which variable was used for the first split?
g. Add a second Decision Tree node to the diagram and connect it to the Data Partition node.
h. Create a decision tree model autonomously. Use average square error as the model assessment statistic. Include the resulting tree in your report.
1) How many leaves are in the optimal tree?
2) Which variable was used for the first split?
i. Compare the cumulative lift value at the top 20% percentile for the two trees you built in e and g. Which one gives better result?