Question: Answer the following questions based on the attached web file named FBS. Please submit both the word document containing the relevant screenshots (tables generated after
Answer the following questions based on the attached web file named FBS.
Please submit both the word document containing the relevant screenshots (tables generated after executing k-means clustering), as well as the Excel worksheets containing the generated tables.
1. The Football Bowl Subdivision (FBS) level of the National Collegiate Athletic Association (NCAA) consists of over 100 schools. Most of these schools belong to one of several conferences, or collections of schools, that compete with each other on a regular basis in collegiate sports. Suppose the NCAA has commissioned a study that will propose the formation of conferences based on the similarities of the constituent schools. The file FBS contains data on schools belong to the Football Bowl Subdivision (FBS). Each row in this file contains information on a school. The variables include football stadium capacity, latitude, longitude, athletic department revenue, endowment, and undergraduate enrollment.
a. Apply k-means clustering with = 10 using football stadium capacity, latitude, longitude, endowment, and enrollment as variables. Be sure to Normalize Input Data and specify 10 iterations and 10 random starts in Step 2 of the k-Means Clustering procedure. Take screenshots of the three tables generated in the KMC_Output worksheet: Cluster Centers, Inter-Cluster Distances, and Cluster Summary (3 points). Analyze the resultant clusters. What is the size of the smallest cluster (2 points)? What is the average distance in the least dense cluster (2 points)? What makes the least dense cluster so diverse (4 points)?
(Tips: The least dense cluster in k means is the one with the highest average distance in the cluster. For the question What makes the least dense cluster so diverse, you need to 1) describe the most unique characteristic of the least dense cluster, by referring to the table of Cluster Centers in the KMC_Output worksheet; 2) compare the inter-cluster distances, by referring to the table of Inter-Cluster Distances in the KMC_Output worksheet. What is the nearest distance between this cluster and the others?
*If you are still confused aboutWhat makes the least dense cluster so diverse, read the following more detailed explanations. First of all, ensure you know which cluster has the least dense. Then you compare the data of this cluster in the table of Cluster Centers in the KMC_Output worksheet with the rest clusters. Specifically, the values of variables including stadium capacity, latitude, longitude, endowment, and enrollment. Is there any value(s) significantly different than the rest clusters? If your data in the table are correct, the answer is there. Second, compare the inter-cluster distances, by referring to the table of Inter-Cluster Distances in the KMC_Output worksheet. To answer the question, "What is the nearest distance between this cluster and the others?", you should read the values under the cluster that has the least dense. These values are the distances between this cluster and the others. After you find the least value, compare it with the rest column's least values, you will see the difference.)
b. What problems do you see with the plan with defining the school membership of the 10 conferences directly with the 10 clusters? (3 points) (Tip: Consider the sizes of clusters)
c. Repeat part a, but this time do not Normalize Input Data in Step 2 of the k-Means Clustering procedure. Take screenshots of the three tables generated in the KMC_Output1 worksheet: Cluster Centers, Inter-Cluster Distances, and Cluster Summary (2 points). Analyze the resultant clusters. Do they look quite different from those in part a (1 point)? Identify the dominating factor(s) in the formation of these new clusters (3 points).
(Tips: Dominating factor is the variable which makes the non-normalized clustering different than the normalized clustering. It is a variable skewing the clustering process because of its magnificent values. When variables are not normalized, the formation of clusters is dominated by the variable on the largest scale. In order to find the dominating factor, please compare the values of variables in the table of Cluster Centers. To gain a better understanding, please read the first paragraph on page 257 of your textbook (page 75 on the Lamar customized version of textbook). You can confirm it by clustering the schools solely on the basis of the dominating factor and then noting the similarity of the resulting clusters to the clusters based on all (non-normalized) variables.)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
