Question: Part 3 : Clustering : This part is concerned with the file: / DataMining / data / arff / UCI / credit - g .
Part : Clustering : This part is concerned with the file:
DataMiningdataarffUCIcreditgarff.
Clustering of the creditg data of part For this part use only the attributes duration, age, credit amount and job. The aim is to determine the number of clusters in the data and assess whether any of the clusters are meaningful.
Run the Kmeans clustering algorithm on this data for the following values of K: Analyse the resulting clusters. What do you conclude? Provide your reasoning.
Choose a value of K and run the algorithm with different seeds. What is the effect of changing the seed? Provide your explanation.
Run the EM algorithm on this data with the default parameters and describe the output and your analysis.
The EM algorithm can be quite sensitive to whether the data is normalized or not. Use the Weka normalize filter
Preprocess Filter unsupervised normalize
to normalize the numeric attributes. What difference does this make to the clustering runs? Provide your reasoning.
The algorithm can be quite sensitive to the values of minLogLikelihoodImprovementCV, minStdDev and minLogLikelihoodImprovementIterating, Explore the effect of changing these values. What do you conclude?
How many clusters do you think are in the data? Give a plain English language description of one of them.
Compare the use of Kmeans and EM for these clustering tasks. Which do you think is best? Why?
What golden nuggets did you find, if any? Report Length Up to one page.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
