Question: Please provide python code and explanations: Clustering [ 1 6 ] Use the file umap.csv to answer the following questions. The file umap.csv contains the
Please provide python code and explanations:
Clustering
Use the file umap.csv to answer the following questions. The file umap.csv
contains the UMAP embeddings of the documents.
Question
How many features do the UMAP embeddings have? What parameter was
specified when performing UMAP to obtain the required number of features?
Question
Create a scatterplot matrix SPLOM of the UMAP embeddings. Comment on
the clustering tendency of the data.
Question
Create a kdist graph using the UMAP embeddings, where the kdistance
is defined as the Euclidean distance from each instance to its th nearest
neighbour. Based on the graph, suggest an appropriate range for the kdistance.
Question
Apply densitybased spatial clustering of applications with noise DBSCAN to
the UMAP embeddings. Select an eps value that will result in approximately
thirteen clusters excluding noise Display the number of instances assigned to
each cluster using an appropriate visualization.
Question
Create a scatterplot matrix SPLOM of the UMAP embeddings where the
colour of an instance should be based on the cluster label of the instance. From
the SPLOM suggest a rule that can be used to assign instances to one cluster.
Question
Select the largest clusters excluding the noise cluster For all the instances
assigned to this cluster:
sum the values of the tfidf embeddings per column,
sort the values from large to small,
select the largest ten values and
display the ten associated tokens.
Does the cluster correspond to a specific infrastructure category? Motivate your
answer.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
