Question: algorithm but definitely different. Step 1 . Randomly select k objects as initial representative objects. Step 2 . For each of non - representative (

algorithm but definitely different.
Step 1. Randomly select k objects as initial representative objects.
Step 2. For each of non-representative (unselected) objects, compute the distances to k
representative (selected) objects and assign to the closest one to obtain a clustering
result.
Step 3. Find a new representative object of each cluster, which minimizes the sum of the
distances to other objects in its cluster. Update the current representative object in
each cluster by replacing with the new one.
Step 4. If the newly updated representative objects are the same with the previous ones,
then stop. Otherwise, go to Step 1.
(1)(3pts) What would be the strength(s) of this algorithm over the original k-means
algorithm? Explain why.
(2)(3pts) What would be the strength(s) of this algorithm over the PAM (Partitioning
Around Medoids) algorithm? Explain why.
(3pts) Suppose that we perform PCA using the five-dimensional dataset shown below.
\table[[X1,X2,X3,X4,X5],[2,4,0.4,0.2,0.02],[5,10,1.0,0.5,0.05],[1,2,0.2,0.1,0.01],[6,12,1.2,0.6,0.06],[8,16,1.6,0.8,0.08],[3,6,0.6,0.3,0.03],[4,8,0.8,0.4,0.04],[7,14,1.4,0.7,0.07],[9,18,1.8,0.9,0.09],[10,20,2.0,1.0,0.10]]
How much variability of the dataset can be explained by the first principal component? Explain why.
(6pts) Consider the similarity matrix of four data points (A,B,C,D) shown below.
(1)(3pts) Find the optimal clustering result that maximizes the following quantity,
Z=k=13bar(ijinCk?s(i,j)),
where s(i,j) is the similarity between object i and j, and Ck indicates the k th cluster.
Notice that the number of clusters is 3. If there are multiple optimal results, find them all.
(2)(3pts) Covert similarities to distances and cluster the four points using complete linkage.
Draw a dendrogram. (5pts) Answer the following questions using the datasets in the figure shown below. Note that each dataset contains 1,000 items and 10,000 transactions. Dark cells indicate ones (presence of items) and white cells indicate zeros (absence of items). We will apply the apriori algorithm to extract frequent itemsets with minsup=10%(i.e., itemsets must be contained in at least 1,000 transactions.)
(c)(1)(Ipt) Which dataset(s) will produce the most number of frequent itemsets? Explain why.
(2)(lpt) Which dataset(s) will produce the fewest number of frequent itemsets? Explain why.
(3)(1pt) Which dataset(s) will produce the longest frequent itemset? Explain why. (e)
(4)(lpt) Which dataset(s) will prodyce the frequent itemset with highest support? Explain |??b
why.
(5)(1pt) Which dataset(s) will pooduce frequent itemsets with wide-varying support levels?
Explain why.
 algorithm but definitely different. Step 1. Randomly select k objects as

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!