Question: Problem 4. You have been hired as a data mining consultant for an insurance company. Your first task is to apply clustering to segment the
Problem 4. You have been hired as a data mining consultant for an insurance company. Your first task is
to apply clustering to segment the customers who are insured with the company. The customer data set
contains only 10 categorical attributes (gender, marital status, occupation, highest education level, etc).
4) After addressing the missing value problem, you re-run the single-link hierarchical clustering algorithm
on the binarized data and show the dendrogram to your supervisor. Your supervisor is still not
satisfied because you did not specify the exact number of clusters in the data. So you examine the
y-axis of the dendrogram and plot the distribution of Euclidean distances where each of the smaller
clusters are merged into a larger one. Let d(k) be the distance shown on the dendrogram when there are k clusters and d(k-1) be the distance shown on the dendrogram when there are k-1 clusters. You plan to use the gap between d(k-1) and d(k) to decide the number of clusters. What are the
minimum and maximum possible values for the gap, d(k-1) - d(k)? Explain why looking at the widest gap between d(k-1) and d(k)
may not help you to determine the right number of clusters
for this data set.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
