This problem lets you see how dissimilarity, the key to clustering, might be calculated and then used

Question:

This problem lets you see how dissimilarity, the key to clustering, might be calculated and then used for prediction. The file P17_23.xlsx contains data for 10people. The value of Amount Spent for person 10 is unknown, and the ultimate purpose is to use the data for the first 9 people to predict Amount Spent for person 10. To do so, a common "nearest neighbor" approach is used. You find the three most similar people to person 10 and then use the average of their Amount Spent values as a prediction for person 10.

(In the data mining literature, this approach is called k-means, with k = 3.) Proceed as follows.

a. For each of the five attributes, Gender to Marital Status, fill in the corresponding yellow boxes as indicated. Each box shows how dissimilar each person is to each other person, based on a single attribute only. The box for Gender has been filled in to get you started.

b. These yellow values can be combined in at least three ways, as indicated by the cell comments above the orange boxes. Fill in these orange boxes.

c. Find the dissimilarity between each person and person 10 in three ways in the blue box at the top, following the cell comment in cell I2.

d. Use Excel's RANK function in the green box to rank the dissimilarities in the blue box.

e. Find three predictions of Amount Spent for person 10, each an average of Amount Spent for the three most similar people to person 10. There will be three predictions because each set of rankings in the green box can lead to a different set of three nearest neighbors.