Question: Big Data Analysis Academic Year 2 0 2 3 - 2 0 2 4 We have access to a database on different consumers according to

Big Data Analysis
Academic Year 2023-2024
We have access to a database on different consumers according to 5 attributes.
\table[[Consumer,\table[[Average],[Spending]],Revenue,Height,Weight,Age],[1,100,1500,165,64,22],[2,78,3200,175,70,35],[3,210,2500,167,60,25],[4,50,2000,176,76,57],[5,140,1600,167,65,21],[6,98,2700,178,82,38]]
Question 1: Calculate the centroid of these data.
Question 2: Calculate the Euclidean distance between Consumer 1 and the centroid.
Question 3: Calculate the Manhattan distance between Consumer 1 and the centroid.
Question 4: Calculate the Euclidean and Manhattan distances between Consumer 4 and the centroid.
Question 5: Will you say that Consumer 1 is better represented by the centroid compared to consumer 4?
Question 6: You want to create two clusters on this dataset. The centroid of the first cluster is given by C1=(80;2500;170;70;45). The centroid of the second cluster is given by C2 ; 2300; 169; 65; 38). To which cluster (C1 or C2) consumer 3 will be affected? Answer using the Euclidean distance.
Question 7: How would you describe the difference between clusters 1 and 2?
Question 8: How does software usually choose to initialize centroid? Could it be a problem for clustering results (explain)?
Question 9: Why combining PCA and K-means algorithms can increase the quality of the clustering?
Question 10: Why the combination of PCA and K-means improve the visualization of the clustering?
Question 11: Describe in practice how to combine PCA and K-means.
1
 Big Data Analysis Academic Year 2023-2024 We have access to a

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!