Question: Problem 4. You have been hired as a data mining consultant for an insurance company. Your first task is to apply clustering to segment the

Problem 4. You have been hired as a data mining consultant for an insurance company. Your first task is

to apply clustering to segment the customers who are insured with the company. The customer data set

contains only 10 categorical attributes (gender, marital status, occupation, highest education level, etc).

4) After addressing the missing value problem, you re-run the single-link hierarchical clustering algorithm

on the binarized data and show the dendrogram to your supervisor. Your supervisor is still not

satisfied because you did not specify the exact number of clusters in the data. So you examine the

y-axis of the dendrogram and plot the distribution of Euclidean distances where each of the smaller

clusters are merged into a larger one. Let d(k) be the distance shown on the dendrogram when there are k clusters and d(k-1) be the distance shown on the dendrogram when there are k-1 clusters. You plan to use the gap between d(k-1) and d(k) to decide the number of clusters. What are the

minimum and maximum possible values for the gap, d(k-1) - d(k)? Explain why looking at the widest gap between d(k-1) and d(k)

may not help you to determine the right number of clusters

for this data set.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!