Question: I have data set named air traffic passenger statistics which is a csv file of 1 5 0 0 6 rows and 1 7 columns

I have data set named air traffic passenger statistics which is a csv file of 15006 rows and 17 columns of which all columns are
Index (integer ex.0)
Activity Period (integer ex.200507)
Operating Airline (String ex. ATA Airlines)
Operating Airline IATA Code (String ex.TZ)
Published Airline (String ex.ATA Airlines)
Published Airline IATA Code (String ex. TZ)
GEO Summary (String ex. Domestic)
GEO Region (String ex. US)
Activity Type Code (String ex.Deplaned)
Price Category Code (String ex. Low fare)
Terminal (string + integer ex. Terminal 1)
Boarding Area (char ex.B)
Passenger Count (integer ex.27271)
Adjusted Activity Type Code (string + integer ex. Thru/Transit*2)
Adjusted Passenger Count (integer ex.27271)
Year (integer ex.2005)
Month (string ex.july)
Now I want to perform the clustering on the dataset, you can choose any 4 different clustering method\algorithm which is the best latest and provide the best result. Firstly, select any 4-5 feature which would be best and appropriate and justify it using mathematical expression of evaluation that why you had chosen that amongst all the columns. secondly you have to find the cluster using the elbow method or whichever you feel the best is to taken for finding the cluster size (dont take cluster size 2 take more than that) plot that in a well-defined labelled graph. Then perform the clustering on 4 different algorithms, plot the scatter plot showing the formation of cluster, data points, centroid. Create a high-level graph with proper labelling and then lastly find the Silhouette Score, Calinski-Harabasz index, and Davies-Bouldin index score. Create a table showing which is the best amongst 4 algo having row as algorithm name and column as score name also explain justify why that particular algorithm is the best.
Tip: You can access the above whole dataset from Kaggle by named Air Traffic Passenger statistics A New Look at an Old Problem
Please perform the high-level clustering for all 4 algorithms, as the clustering code which are available on the internet is simple, I have already implemented that but I need more in that more enhanced version of all the 4 algorithms which shows more better clustering and output.
Note 1: please do not copy paste the online code or AI/GPT code. Write your own logic code and enhance the all 4 different algorithms.
Note 2: Provide all the explanation, all the code, table and its output.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!