Question: The Language is Python! Please help with 2B and 2C 2b) Numerical Labels We need to convert the category labels to numerical labels. Use the

The Language is Python! Please help with 2B and 2C
2b) Numerical Labels We need to convert the category labels to numerical labels. Use the following mapping to convert the values in category into numerical labels. Store the numeric values in a new column called 'category_num' politics 1 recreational -> 2 computer -> 3 religion -> 4 science > 5 misc > 6 Hint: you can use the.replace() method from pandas In [ ]: # YOUR CODE HERE raise Not ImplementedError In [ ]: assert set(np.unique (news_df ['category_num'])) == {1,2,3,4,5,6} assert sum(news_df['category_num'] == 2) == 3956 2c) Convert Text data into vector We will now create a CountVectorizer object to transform the text data into vectors with numerical values. To do so, we will initialize a CountVectorizer object, and store this object in vectorizer We need to pass 4 arguments to initialize a CountVectorizer: 1. analyzer: 'word' Specify to analyze data at the word-level. 2. max_features: 2000 Set a max number of unique words. 3. tokenizer: word_tokenize Set to tokenize the text data by using the word_tokenizer from NLTK. 4. stop_words: stopwords.words('english) Set to remove all stopwords in English. We do this since they generally don't provide useful discriminative informat ion. In [ ]: # YOUR CODE HERE raise Not ImplementedError() In [ ]: assert vectorizer.analyzer == 'word assert vectorizer.max_features == 2000 assert vectorizer.tokenizer == word_tokenize assert vectorizer.stop_words == stopwords.words('english) assert hasattr(vectorizer, "fit_transform")
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
