Question: Part 2: Removing stopwords and Frequency Counts Import the Gutenberg collection and the stopwords for the English language as part of a program that counts

Part 2: Removing stopwords and Frequency Counts

Import the Gutenberg collection and the stopwords for the English language as part of a program that counts the frequencies of the words in Shakespeares Macbeth. The steps are as follows:

- Import the necessary modules - Read in the words in Macbeth. This will include all stopwords

- Step though the list of words in Macbeth, appending those that are not stopwords to a list

- For the resulting list, you can obtain the frequencies using one of the nltk functions

Submit a screenshot of the most common words in that list

This is what i got so far,but having an error..............

import nltk from collections import Counter, OrderedDict import operator from nltk.corpus import gutenberg import re

print(len(gutenberg.raw('shakespeare-macbeth.txt')))

gb = nltk.corpus.gutenberg sw = set(nltk.corpus.stopwords.words('english'))

print('----Excluding the stopwords-----') text_sent = gb.sents("shakespeare-macbeth.txt")[:100351]

filtered_i=[]

for sent in text_sent: filter=[w for w in sent if w.lower() not in sw] filtered_i.append(filter)

#print(filtered)

result= Counter(filtered_i).most_common(10)

result_dict= OrderedDict(result)

print(result_dict)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

NEED ANSWER FOR PART3 Part 1: Setup Familiarize yourself with the documentation available at https://www.nltk.org/ Install NLTK with pip Install pyPDF2 via pip In IDLE o Import nitk o Use...

Hey guys i need help with these two java programs. All help would be greatly appreciated. If you could also comment so I know whats going on exactly that would help a lot. Thanks! In Java, it is...

Design and implement functions that build an order K Markov model from a piece of input text. Markov models are popular tools in speech recognition, handwriting recognition, information retrieval,...

In this project, you will create a python program that counts the word frequency in a list of New York Times (NYT) articles, per category and in general. First, make sure to download the following...

Main.py From here, bst.py The introduction is here from bst import BST class Pair: '.' Encapsulate letter, count pair as a single entity. Realtional methods make this object comparable using built-in...

I have attached 2 business research. Write a 700- to 1,050-word paper in which you practice identifying the critical first stage of developing any research study: State the purpose of the business...

(2 PARTS) USING THE FOLLOWING PYTHON VERSION AS AN EXAMPLE, write two t erm frequency programs following the following two STYLE constraints and requirements in JAVASCRIPT with NODE.JS: ** Program...

(2 PARTS) USING THE FOLLOWING PYTHON VERSION AS AN EXAMPLE, write two t erm frequency programa following the following two STYLE constraints and requirements in JAVASCRIPT with NODE.JS: 1. Program...

Use the following information about Sunnyside Ltd. to calculate the accounts receivable turnover ratio and the average collection period for 2016 and 2017. Explain what each numbermeans. Accounts...

A sky diver of mass 80.0 kg jumps from a slow-moving aircraft and reaches a terminal speed of 50.0 m/s. (a) What is the acceleration of the sky diver when her speed is 30.0 m/s? What is the drag...

Which of the following profitability measures does NOT require the discount rate as necessary inputs for calculation? Group of answer choices Profitability Index IRR NPV Discounted Payback Period

The table below gives the VLE data of the ethyl acetate (1) + cyclohexane (2) system at 293 K: a) Determine if azeotrope exists and if so its location by plotting y1 vs x1. b) Prepare P-xy diagram of...

How do modern Dashboards differ from earlier implementations?

Provide an example of a descending Hierarchy of Data Validation/Lookup Tables.

In a HCM Database, how does applying Relational Design and Third Normal Form rules avoid duplication of Job Title storage in each employee base record?