Question: 3 . In this problem, you will generate simulated data, and then perform K - means clus - tering on the data. ( a )

3 .

In this problem, you will generate simulated data, and then perform K

-

means clus

-

tering on the data.

(

)

Generate a simulated data set with

20

observations in each of K

= 5

well

-

separated clusters, with p

= 2

variables describing each observation. Do it in similar fash

-

ion to K

= 3

case in

-

means: Selecting K

lecture slides. Plot the resulting data points.

(

)

Run K

-

means algorithm on your simulated data for K

= 4, 5

6 .

Use

50

random starts in each case. Provide the code and report the total within

-

cluster sum of squares

(

WSS

)

for each solution. Which transition caused the bigger drop in total WSS

,

from K

= 4

to K

= 5,

or from K

= 5

to K

= 6 ?

(

)

Proceed to plot the clustering solutions for K

= 4, 5

6 (

just use the plots automatically generated by eclust

()

function

) .

Judging by the plots, explain why the respective sizes of WSS drops in part

(

)

were expected. Use lecture slides

(

the K

= 3

simulation study

)

for reference.

(

)

Run K

-

means clustering on your simulated data for K

= 1, 2, . . ., 10,

record total WSS for each K value. Plot the progression of WSS values. According to

elbow

method logic, which K value appears as the optimal one?

(

)

For a more formal approach

(

alternative to elbow method from part

(

)),

run the gap statistic calculations for K

= 1, . . ., 10,

with

50

replicates each. Which K value is optimal? Does it match with the # of well

-

separated clusters in our simulated data? Provide the plot of gap statistic values.

please give answer to this question in r studio

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

How to solve Problem 2 by using R language? Problem 1 Boston House Data Recall in Project 2, Problem 3, we analyzed the Boston housing data. The description of this data set can be found at:...

In this exercise using RStudio, generate simulated data, and will then use this data to perform best subset selection. Use the following code to generate a predictor X of length n = 100, as well as a...

7. We talked in class about selection bias in the context of hypothesis tests, but it affects many other aspects as well. For example, the least squares estimate, ?, of the error variance ois...

identify an adult entertainment district in a city of your choice. Explain about how you identify and apply Cameron (2004)'s Typology of Sex Economies to describe the district. Urban Studies, Vol....

This question concerns lexical grammars. (a) Tree Adjoining Grammars contain two types of elementary tree. (i) What are these trees called? [1 mark] (ii) If one were building a grammar for English...

Let r and s be solutions to the quadratic equation x 2 b x + c = 0. For n N, define d0 = 0 d1 = r s dn = b dn1 c dn2 (n 2) Prove that dn = r n s n for all n N. [4 marks] (b) Recall that a commutative...

The purpose of this assignment is to be able to critique a research article including critically examining its strengths and weaknesses, internal and external validity, and where appropriate,...

HBSP Product Number TCG 333 THE CRIMSON PRESS CURRICULUM CENTER THE CRIMSON GROUP, INC. Note on Strategic DecisionMaking in Healthcare Organizations During the next several decades, healthcare...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Introduction and learning objectives When you were learning about operational analysis earlier in the term, we talked about jobs that require multiple visits to the CPU (or servers) to receive their...

Your best friend wants to know why on earth you are taking a fraud examination class. He is curious about what careers this class prepares you for. Go to the Internet and find information about two...

If uz 2u4 = 0 u(1,0) = 10e z Then u(r, t) = - %3D %3D O 10e I+ O 10e t t O 10e 1 4t O 10e Tit O 10e 14t O 10e 2t O 10e T12t O 10e 117

The Street Division of Labrosse Logistics just started opecations. It purchased depreciable assets costing $ 3 6 millign and having a four - year expected life, after which the assets can be salvaged...

Let f(x) = 3x If g(x) is the graph of f(x) shifted down 5 units and right 2 units, write a formula for g(x) g(x)=