Question: Q 2 : Improving LDA through better TFIDF model [ 6 points ] Now, looking at your answers from Q 1 , you will see

Q2: Improving LDA through better TFIDF model [6 points]
Now, looking at your answers from Q1, you will see that the topics printed by print_top_words dont make much sense. Lets look at a few ways
we can improve this. First challenge is that the amount of tokens on the full data is large. Also we have a lot of stop words,
Task:
Let us limit the TfidfVectorizer to about 5000 tokens (max_features) and set the Tfidfvectorizer to remove english stopwords.
Save the new vectorizer as vectorizer_tfidf_lim
# 2 Points
# YOUR CODE HERE
raise NotImplementedError()
vectorizer_tfidf_limit. fit(documents)
tfidf_feature names_limit_ vectorizer_tfidf_limit. get_feature_names_out()
Ida_tfidf_limit = fit_LDA(X_dtm_tfidf_limit,n-components
Q 2 : Improving LDA through better TFIDF model [

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!