Customer Rating of Breakfast Cereals. The dataset Cereals.jmp Download Cereals.jmpincludes nutritional information, store display, and consumer ratings
Question:
Customer Rating of Breakfast Cereals. The dataset Cereals.jmp Download Cereals.jmpincludes nutritional information, store display, and consumer ratings for 77 breakfast cereals.
Data preprocessing. Note that some cereals are missing values. These will be automatically omitted from the analysis. Use the Cols>Columns Viewer to identify which variables are missing values and how many values are missing.
- The variables [ Select ] ["weight, cups, and rating", "carbo, sugars, and potass", "calories, protein, and shelf"] have [ Select ] ["12", "2", "4"] missing variables in total.
The Hierarchical platform dialog provides an option to standardize (Standardize Data). Should this be selected? Why?
- The scales used for measurement [ Select ] ["are not significantly", "are vastly"] different, so the distance measure [ Select ] ["would be", "would not"] dominated by variables with larger values. Hence,Standardize Data option in Hierarchical platform [ Select ] ["should not be", "should be"] selected.
Apply hierarchical clustering to the data using single linkage and complete linkage (use only continuous variables in Y, Columns and cast the variable name to Label).Look at the dendrograms and the parallel plots. Comment on the structure of the clusters and on their stability.
With [ Select ] ["single linkage", "complete linkage"] , small changes in the distance cause large changes in the number of clusters. For example, the distance from 55 to 30 clusters is very narrow - clusters change very quickly over a short distance. So, [ Select ] ["complete linkage", "single linkage"] is more unstable. The change in clusters for [ Select ] ["complete linkage", "single linkage"] is more gradual.
- Hence [ Select ] ["single linkage", "complete linkage"] method leads to the most insightful or meaningful clusters.
- InDistance Graph there is a sharp upward bend at cluster number= [ Select ] ["3", "5", "2"] . This gives an idea about the optimal number of clusters that will be used in clustering.
The public elementary schools would like to choose a set of cereals to include in their daily cafeterias. Every day a different cereal is offered, but all cereals should support a healthy diet. For this goal you are requested to find a cluster of ''healthy cereals.''
Based on the variables at hand, how would you characterize ''healthy cereals''?
[ Select ] ["High", "Low"] calories, [ Select ] ["High", "Low"] protein, [ Select ] ["Low", "High"] fat, [ Select ] ["Low", "High"] fiber, [ Select ] ["Low", "High"] carbo, [ Select ] ["High", "Low"] sugar, [ Select ] ["High", "Low"] potass, [ Select ] ["Low", "High"] vitamins.
Use the red triangle options Cluster Summary, Cluster Means, and Parallel Coord Plots to check cluster means across the variables. Which cluster of cereals is the most ''healthy''?
[ Select ] ["Cluster 1", "Cluster 2", "Cluster 4"] is the healthiest, with high protein, fiber, and potass and low calories, fat, and carbs. But, this cluster contains the high bran and high fiber cereals that students might generally don't like. An alternative might be [ Select ] ["cluster 2", "cluster 4", "cluster 5"] , which is moderately high in the "good" characteristics (protein, vitamin, potassium) and students would be more likely to eat.