Question: Python code below. it is not displaying all the requirements: Remove all the punctuations and non-English words, then count the number of the rest

Python code below. it is not displaying all the requirements:

Remove all the punctuations and non-English words, then count the number of the rest of the words in the file
Using the words after step 1 to build a word dictionary, all the words in the dictionary are unique (e.g. the word "But" and "but" should be considered as the same word)
- Count the number of distinct words in your dictionary
- The words in the dictionary should be displayed in an alphabetic order
Select three sentences from the file, then use any POS tagging tools to identify POS tags in the selected sentences.

Code:

import string

import nltk

from collections import OrderedDict

# Download necessary NLTK data

nltk.download('averaged_perceptron_tagger')

nltk.download('words')

# Define function to check if a word is English

english_vocab = set(w.lower() for w in nltk.corpus.words.words())

def is_english(word):

return word.lower() in english_vocab

try:

with open(r'file_path', 'r') as f:

text = f.read()

# Preprocess: Remove punctuation and non-English words

exclude = set(string.punctuation)

text = ''.join(ch for ch in text if ch not in exclude and ch.isascii())

words = text.split()

words = [word for word in words if is_english(word)]

# Count processed words and print

total_processed_words = len(words)

print(f"Total processed words: {total_processed_words}")

# Build dictionary of unique words

word_dict = OrderedDict()

for word in words:

word_lower = word.lower()

if word_lower not in word_dict:

word_dict[word_lower] = 1

else:

word_dict[word_lower] += 1

# Count distinct words and print

distinct_word_count = len(word_dict)

print(f"Number of distinct words: {distinct_word_count}")

# Print words in alphabetical order

print("\nWords in alphabetical order:")

for word in sorted(word_dict):

print(word)

# Select sentences and POS tag

# Replace these sentences with ones from your file if necessary

sentences = [

"from fairest creatures we desire increase that thereby beautys rose might never die",

"when forty winters shall besiege thy brow and dig deep trenches in thy beautys field",

"for where is she so fair whose uneared womb disdains the tillage of thy husbandry"

]

for sentence in sentences:

pos_tags = nltk.pos_tag(sentence.split())

print("\nSentence:", sentence)

print("POS Tags:", pos_tags)

except Exception as e:

print("An error occurred:", e)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Algorithms Questions!

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

CANMNMM January of this year. (a) Each item will be held in a record. Describe all the data structures that must refer to these records to implement the required functionality. Describe all the...

An important U.S. government organization charged with setting human resource management guidelines is O the EEOC (Equal Employment Opportunity Commission). the OSHA (Occupational Safety and Health...

-9 + (-5) Find the sum by hand.

Suppose that a steel of eutectoid composition is cooled to 550C (1020F) from 760C (1400F) in less than 0.5 s and held at this temperature. (a) How long will it take for the austenite-to-pearlite...

Each round played by a contestant is either a success with probability p or a failure with probability 1 p. If the round is a success, then a random amount of money having an exponential distribution...

At the beginning of the summer, Jack Wells was looking for a way to earn money to pay for his college tuition in the fall. He decided to start a lawn service business in his neighborhood. To get the...

The Jonas Corporation uses a process system. During the current period, 2, 500 units were started and 1, 100 units were completed and transferred out. Ending units were 60% complete for materials and...

Homestead Crafts, a distributor of handmade gifts, operates out of owner Emma Finn's house. At the end of the current period, Emma looks over her inventory and finds that she has 1,300 units...

Accounting for Gift Cards Assume Ikeo Inc. sold $160,000 of gift cards during the last two weeks of December 2020. No gift cards were redeemed in 2020, while $144,000 of the gift cards were redeemed...

Continue Mini Case 1 (the Accounting Cycle Part 1), the following information is available for FastForward in September that may need adjustments.1 1. By the end of September, FastForward has...

A water molecule is shown below. Both hydrogen atoms are the same distance from the oxygen atom and are in the x-y plane. Choose the origin to be the center of the oxygen atom, the + x direction to...

Leon, age 45, is an active participant in his employer's defined benefit retirement plan, but he would also like to make a deductible contribution to a traditional IRA this year. Leon is married,...

Practice Problems 4. If the projection Pa and component Fb of the force F along oblique axes a and b are both 325 N, determine the magnitude F and the orientation 0 of the b-axis. F=424 N.0 17.95 75%...

(5 pts) Provide a complete, detailed mechanism that shows how the -ketoester is formed in the following sequence of reactions. Include all intermediate structures and all important resonance...

Find a least expensive route, in monthly lease charges, between the pairs of computer centers in Exercise 11 using the lease charges given in Figure 2. a) Boston and Los Angeles b) New York and San...

The following additional information is available for the Dr. Ivan and Irene Incisor family. Ivan and Irene have the following investment income, in addition to that reported in Chapter 1: Dividends...

Phil and Linda are 25-year-old newlyweds and file a joint tax return. Linda is covered by a retirement plan at work, but Phil is not. a. Assuming Phil's wages were $27,000 and Linda's wages were...

Matthew borrows $250,000 to invest in bonds. During 2012, his interest on the loan is $30,000. Matthew's interest income from the bonds is $10,000. This is Matthew's only investment income. a....

Where do most of the data values fall? What is a typical value for the data set? What does this say about the variable being summarized?

Some days of the week are more dangerous than others, according to Traffic Safety Facts produced by the National Highway Traffic Safety Administration. The average number of fatalities per day for...

Figure EX-3.47 is from the Fall 2008 Census Enrollment Report at Cal Poly, San Luis Obispo. It uses both a pie chart and a segmented bar graph to summarize data on ethnicity for students enrolled at...