Question: Please provide python code and explanations for the following: The current approach to finding topics relied on tf - idf, UMAP, DBSCAN, and a custom

Please provide python code and explanations for the following: The current approach to finding topics relied on tf-idf, UMAP, DBSCAN, and
a custom tf-idf approach to explain each topic. In this section, you need to
compare the original method with alternative methods by making a maximum
of one change. In other words, you will always modify the original model.
Question 6.1[6]
Replace with a pretrained sentence embedding model. Discuss
whether the sentence embedding model provides better embeddings by com-
paring the similarity of documents using cosine distance.
Question 6.2[6]
Replace umap with PCA. Discuss the impact of using PCA on the quality
of clusters and the overall topic modeling results. In your answer specifically
discuss what properties each technique tries to observe and whether it is relevant
for the subsequent steps.
Question 6.3[6]
Use K-Means instead of DBSCAN for clustering. Compare the clustering results
and specifically discuss the appropriateness of K-Means for this task.
Question 6.4[10]
Train then use a decision tree (surrogate model) instead of a custom tf-idf to
globally explain why instances are assigned to the largest cluster obtained. Your
decision tree should be interpretable. In your answer clearly describe the steps
you have taken to train the decision tree and motivate how the trained decision
tree is interpretable.
Please provide python code and explanations for

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!