Question: Part 3 : Clustering ( 5 0 pts total ) For Part 3 , you will apply k - means clustering to two different simulated

Part

3

: Clustering

(50

pts total

)

For Part

3,

you will apply

k -

means clustering to two different simulated datasets and compare

its performance. Both datasets are generated with

N = 500

observations, but each were

generated with a different number of "clusters." These datasets, ps

2 - 1 .

csv and ps

2 - 2 .

csv

will be labeled as "one" and "two," respectively.

3.1

Vanilla

k -

means

(25

pts total

)

3.1.1

Visualizing the data

(5

pts

)

Make a scatterplot of each dataset. Visually, guess at how many groups were used to simulate

the data and include the name of the dataset and your guess in the title of each.

# TODO: Two plots

3.2

Implementing

k -

means with Manhattan distance

(25

pts total

)

3.2.1

Computing Manhattan distance

(5

pts

)

Implement the Manhattan distance metric in a function that you can use in your

k -

means

implementation. Write three test cases to be sure your function works correctly.

# TODO: Manhattan distance

3.2.2

Running k

-

Means again

(5

pts

)

Using your implementation from

3.1.2,

implement the

k -

means algorithm with Manhattan

distance distance. Recall that you will be assigning points to clusters based on minimum

Manhattan distance. Be sure to produce the within

-

cluster sum of squared Manhattan

distance

(

WCSS

) .

For each dataset, run the algorithm

M = 10

times using a for

()

loop for

your optimal values of

k

from

3.1.3 .

Plot the best solution for that value of

k,

with points

colored by cluster assignment. Include the dataset,

k,

and the final WCSS in the title of each

plot.

Part 3 : Clustering ( 5 0 pts total ) For Part 3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Please answer this 6 long questions below with the correct answer. If I could have this by tomorrow I would like to tip for a job well done, thanks. 1.Suppose that a two-factor model, where the...

1. Statistical analysis involves the following type(s) of analysis? a. Descriptive Analysis b. Differences Analysis c. Associative Analysis d. Predictive Analysis e. All of the above 2. Differences...

Read the case below - Tech Talk: creating a social media strategy What are the different ways Tech Talk can generate traffic? Mention 3 channels that can generate traffic for Tech Talk. Do you...

https://drive.google.com/drive/folders/1F1VdLvOeg0e5aS1CAKzXMIkiy-CnOx-P?usp=sharing data if you click and download --- title: "ISYE6414 - Midterm Exam 1 - Open Book Section (R) - Part 2" output:...

I need help completing part 1,2 and 3. Please review the attachment for the questions and the background of the company. ACCT215 - Intermediate Accounting Portfolio Project This Assessment is worth...

The nal exam in this course is a "practical exam" that spans Modules 8 and 9. In this Part lassignment, you will review a data set and construct a hypothesis statement. In the Module 9, Part II...

Project Title: "Advanced Image Recognition Techniques: Feature Extraction, Enhancement, and Ensemble Methods" Objective: Explore and compare the effectiveness of various feature extraction techniques...

1. Consider the different types of costs discussed in this unit. List any three (3) types of costs and provide one specific example of each cost from the case. 2. Based on the information provided,...

1. Two liters of a 20% alcohol solution is mixed with x liters of a 50% alcohol solution. if the new mixture contains 38% alcohol, what is the value of x? a. 2 L b. 3 L c. 4 L d. 5 L 2. A container...

Which of the following statements about the Saver's Credit is ( are ) true? I. The Saver's Credit is $ 5 , 0 0 0 regardless of an individual's contribution to a Roth IRA, 4 0 1 ( k ) plan, SIMPLE...

A retail chain is considering an investment of Rs. 6,00,000 in a new store. The store has a useful life of 10 years and no salvage value. It is expected to generate annual net operating income after...