Question: algorithm but definitely different. Step 1 . Randomly select k objects as initial representative objects. Step 2 . For each of non - representative (

algorithm but definitely different.

Step

1 .

Randomly select

k

objects as initial representative objects.

Step

2 .

For each of non

-

representative

(

unselected

)

objects, compute the distances to

k

representative

(

selected

)

objects and assign to the closest one to obtain a clustering

result.

Step

3 .

Find a new representative object of each cluster, which minimizes the sum of the

distances to other objects in its cluster. Update the current representative object in

each cluster by replacing with the new one.

Step

4 .

If the newly updated representative objects are the same with the previous ones,

then stop. Otherwise, go to Step

1 .

(1) (3

pts

)

What would be the strength

(

)

of this algorithm over the original

k -

means

algorithm? Explain why.

(2) (3

pts

)

What would be the strength

(

)

of this algorithm over the PAM

(

Partitioning

Around Medoids

)

algorithm? Explain why.

(3

pts

)

Suppose that we perform PCA using the five

-

dimensional dataset shown below.

\

table

[[

1,

2,

3,

4,

5], [2, 4, 0.4, 0.2, 0.02], [5, 10, 1.0, 0.5, 0.05], [1, 2, 0.2, 0.1, 0.01], [6, 12, 1.2, 0.6, 0.06], [8, 16, 1.6, 0.8, 0.08], [3, 6, 0.6, 0.3, 0.03], [4, 8, 0.8, 0.4, 0.04], [7, 14, 1.4, 0.7, 0.07], [9, 18, 1.8, 0.9, 0.09], [10, 20, 2.0, 1.0, 0.10]]

How much variability of the dataset can be explained by the first principal component? Explain why.

(6

pts

)

Consider the similarity matrix of four data points

(A, B, C, D)

shown below.

(1) (3

pts

)

Find the optimal clustering result that maximizes the following quantity,

Z = \frac{_{k = 1}^{3}}{b} a r (_{i j i n C_{k}}^{?} s (i, j)),

where

s (i, j)

is the similarity between object i and

j,

and

C_{k}

indicates the

k

th cluster.

Notice that the number of clusters is

3 .

If there are multiple optimal results, find them all.

(2) (3

pts

)

Covert similarities to distances and cluster the four points using complete linkage.

Draw a dendrogram.

(5

pts

)

Answer the following questions using the datasets in the figure shown below. Note that each dataset contains

1, 000

items and

10, 000

transactions. Dark cells indicate ones

(

presence of items

)

and white cells indicate zeros

(

absence of items

) .

We will apply the apriori algorithm to extract frequent itemsets with minsup

= 10 % (

.

.,

itemsets must be contained in at least

1, 000

transactions.

)

(

) (1) (

Ipt

)

Which dataset

(

)

will produce the most number of frequent itemsets? Explain why.

(2) (

lpt

)

Which dataset

(

)

will produce the fewest number of frequent itemsets? Explain why.

(3) (1

)

Which dataset

(

)

will produce the longest frequent itemset? Explain why.

(e)

(4) (

lpt

)

Which dataset

(

)

will prodyce the frequent itemset with highest support? Explain

|_{?_{?}} b

why.

(5) (1

)

Which dataset

(

)

will pooduce frequent itemsets with wide

-

varying support levels?

Explain why.

algorithm but definitely different. Step 1. Randomly select k objects as

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Read the above study and give summary conclusions 2. METHODOLOGY 2.1. K-means Clustering At the beginning of this paper, the author introduces that the matches took place in different Balkan cities,...

nodes, but at least its bias can be quantified by Markov Chain L. INTRODUCTION analysis and thus can be corrected via appropriate re-weighting The popularity of online social networks (OSNs) in...

Overview and Requirements For this programming assignment, we are going to implement the k-means clustering algorithm in Jupyter Notebook. Cluster analysis seeks to separate objects into groups (or...

Describing Data Once we have collected data from surveys or experiments, we need to summarize and present the data in a way that will be meaningful to the reader. We will begin with graphical...

Program Description: This assignment tests your understanding of inheritance, polymorphism, interfaces, and abstract classes. The program is a simulator of city streets, traffic lights, and vehicles....

PLEASE COMPLETE NO LATER THAN 10/14 @3:30PM Each question(1,2,& 3) must be a minimum of 200 words. Please EXPLAIN answers in FULL detail and make answers knowledgeable based off the attached reading,...

Provided Codes: Program Description: This assignment tests your understanding of inheritance, polymorphism, interfaces, and abstract classes. The program is a simulator of city streets, traffic...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

PLEASE COMPLETE NO LATER THAN 10/07 @8:00AM Each question(1,2,& 3) must be a minimum of 200 words. Please make answers detailed and knowledgeable based off the attached reading. ARE YOU ABLE TO...

Tie-Dye T-Shirt Co. established a petty cash fund of $675 on May 1, 2015. The petty cash is replenished when the cash on hand is less than $200. On May 16, the petty cash on hand was $170.23. The...

Sunland Company had the following Shareholders Equity accounts as of May 1, 2020: Share capital: Preferred shares, 60,000 issued and outstanding $738,000 Class A common shares, 150,000 issued and...

Insert the following values into an initially empty Red - Black tree in the order given. Show each insertion. Insert: 7 6 , 5 3 , 3 9 , 4 8 , 1 7 , 6 8 , 8 2 , 2 9 , 4 0 , 1 3 , 9 , 2 5 showing side...

The q , style of conflict resolution emphasizes an indirect approach for dealing with conflict and an emotionally restrained manner. accommodating engagement discussion dynamic

Explain the potential advantages of e-learning for training. page 303

Choose appropriate evaluation design and training outcomes based on the training objectives and evaluation purpose. page 310

Design a training session to maximize learning. page 309