Question: Part 2: Removing stopwords and Frequency Counts Import the Gutenberg collection and the stopwords for the English language as part of a program that counts

Part 2: Removing stopwords and Frequency Counts

Import the Gutenberg collection and the stopwords for the English language as part of a program that counts the frequencies of the words in Shakespeares Macbeth. The steps are as follows:

- Import the necessary modules - Read in the words in Macbeth. This will include all stopwords

- Step though the list of words in Macbeth, appending those that are not stopwords to a list

- For the resulting list, you can obtain the frequencies using one of the nltk functions

Submit a screenshot of the most common words in that list

This is what i got so far,but having an error..............

import nltk from collections import Counter, OrderedDict import operator from nltk.corpus import gutenberg import re

print(len(gutenberg.raw('shakespeare-macbeth.txt')))

gb = nltk.corpus.gutenberg sw = set(nltk.corpus.stopwords.words('english'))

print('----Excluding the stopwords-----') text_sent = gb.sents("shakespeare-macbeth.txt")[:100351]

filtered_i=[]

for sent in text_sent: filter=[w for w in sent if w.lower() not in sw] filtered_i.append(filter)

#print(filtered)

result= Counter(filtered_i).most_common(10)

result_dict= OrderedDict(result)

print(result_dict)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!