Question: Please provide python code and explanations for the following: The current approach to finding topics relied on tf - idf, UMAP, DBSCAN, and a custom
Please provide python code and explanations for the following: The current approach to finding topics relied on tfidf, UMAP, DBSCAN, and
a custom tfidf approach to explain each topic. In this section, you need to
compare the original method with alternative methods by making a maximum
of one change. In other words, you will always modify the original model.
Question
Replace with a pretrained sentence embedding model. Discuss
whether the sentence embedding model provides better embeddings by com
paring the similarity of documents using cosine distance.
Question
Replace umap with PCA. Discuss the impact of using PCA on the quality
of clusters and the overall topic modeling results. In your answer specifically
discuss what properties each technique tries to observe and whether it is relevant
for the subsequent steps.
Question
Use KMeans instead of DBSCAN for clustering. Compare the clustering results
and specifically discuss the appropriateness of KMeans for this task.
Question
Train then use a decision tree surrogate model instead of a custom tfidf to
globally explain why instances are assigned to the largest cluster obtained. Your
decision tree should be interpretable. In your answer clearly describe the steps
you have taken to train the decision tree and motivate how the trained decision
tree is interpretable.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
