Question: Task: Clustering Using Titanic train.csv dataset - same as previous HWs: [15pt] Prepare the dataset for analysis. In this analysis, we use pclass, fare, age,

Task: Clustering

Using Titanic train.csv dataset - same as previous HWs:

[15pt] Prepare the dataset for analysis. In this analysis, we use "pclass", "fare", "age", "sex", "embarked" as input variables. Please perform necessary cleaning and transformation.

[45pt] Implement the 2 clustering algorithms - K-means and Hierarchical Clustering, and add the cluster labels to the dataset as a new column. Please set # of clusters = 3 for both algorithms.

[40pt] For K-mean algorithm, make an Elbow plot using K from 1 to 20 and calculate the WCSS. Please show the plot and answer the question "which K should we choose". You can modify the code I provided in the class - no need to write your own from scratch.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!