Question: Part 2: Removing stopwords and Frequency Counts Import the Gutenberg collection and the stopwords for the English language as part of a program that counts
Part 2: Removing stopwords and Frequency Counts
Import the Gutenberg collection and the stopwords for the English language as part of a program that counts the frequencies of the words in Shakespeares Macbeth. The steps are as follows:
- Import the necessary modules - Read in the words in Macbeth. This will include all stopwords
- Step though the list of words in Macbeth, appending those that are not stopwords to a list
- For the resulting list, you can obtain the frequencies using one of the nltk functions
Submit a screenshot of the most common words in that list
This is what i got so far,but having an error..............
import nltk from collections import Counter, OrderedDict import operator from nltk.corpus import gutenberg import re
print(len(gutenberg.raw('shakespeare-macbeth.txt')))
gb = nltk.corpus.gutenberg sw = set(nltk.corpus.stopwords.words('english'))
print('----Excluding the stopwords-----') text_sent = gb.sents("shakespeare-macbeth.txt")[:100351]
filtered_i=[]
for sent in text_sent: filter=[w for w in sent if w.lower() not in sw] filtered_i.append(filter)
#print(filtered)
result= Counter(filtered_i).most_common(10)
result_dict= OrderedDict(result)
print(result_dict)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
