Question: Google Colab Clustering Assignment Instructions Objective: Perform unsupervised learning on a vehicular dataset using k - means clustering to identify cluster centroids for three different

Google Colab Clustering Assignment Instructions
Objective: Perform unsupervised learning on a vehicular dataset using k-means clustering to identify cluster centroids for three different ECU signatures, namely steering, speed, and RPM.
Note: This data set was obtained from three sedan vehicles of a single make (Nissan). It has been pre- processed to obtain the columns relevant to the signatures you will need to use as inputs. These columns are ECU300(steering), ECU1F9(tachometer), and ECU280(speed). They contain physical (or actual) values of these signatures at different time instants. For those interested, the units of speed and tachometer are in miles per hour (mph) and revolutions per minute (RPM).
Instructions:
1) Navigate to colab.research.google.com in your browser and open the ECU Clustering.ipynb Python notebook file using Google Colab (File > Open Notebook (or ctrl+o)).
2) Execute cells individually by clicking on the Run cell icon. Alternatively, after you select a cell, you can hit (ctrl+enter) to execute it.
3) The notebook has been segregated into three sections: Section 1,2, and 3 contain the k- means clustering implementations for ECU signatures speed, tachometer, and steering, respectively.
4) The following are cells where you need to make modifications for completing the table.
a) Code cells 3,7, and 11 need to be modified to accommodate a MinMax scaling function to normalize the input data (ECU signatures. Use the same scaling function for all the ECU signatures.
b) Identify the optimal number of clusters (K) and the sum of squared error (SSE) using the elbow method for each of the three ECU signatures.
c) Verify your choice of clusters by comparing the results of a clustering metric called the Calinski-Harabasz (CH) score by using different numbers of clusters.
d) Provide descriptive statistics, that is, the minimum, maximum, and mean, for each cluster and for each ECU signature. Use subscripts to designate the statistics for that particular cluster. For instance, the mean value for cluster 1 could be written as Mean1.
Populate the following three tables with your observations.
Table I: Evaluating number of clusters for speed ECU signature
Number of Clusters (K) SSE (Elbow Method) CH Score Min, Max, Mean
K =3
K =4
K =5
Table II: Evaluating number of clusters for RPM ECU signature
Number of Clusters (K) SSE (Elbow Method) CH Score Min, Max, Mean
K =3
K =4
K =5
Table III: Evaluating number of clusters for steering ECU signature
Number of Clusters (K) SSE (Elbow Method) CH Score Min, Max, Mean
K =3
K =4
K =5
Answer the following questions based on your findings.
1. How does the CH score change as the number of clusters (K) is increased? Provide a justification for your answer.
2. Why cant metrics such as precision or recall be used to evaluate the performance of clustering algorithms like k-means++?
3. What is the optimal number of clusters (K) that shows consensus among the elbow evaluation method and the CH score?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!