Question: Please provide the code to find out how many word types account for a third of all word tokens in Brown import nltk from nltk.corpus

Please provide the code to find out how many word types

Please provide the code to find out how many word types account for a third of all word tokens in Brown

import nltk from nltk.corpus import brown Vow to get the list of word tokens in brown, do: >>> bw = brown.words() The name bw has now been set to the list of all the word tokens in Brown. To get a data structure with all the word counts, do: >>> fd = nltk. FreqDist(bw) Having done all that you can now execute the following to find the top 10 most frequent words and their counts: >>> fd.most common (10) [(u'the', 62713), (u',',58334), (u'.', 49346), (u'of', 36080), (u'and', 27915), (u'to', 25732), (u'a', 21881), (u'in', 19536), (u'that', 10237), (u'is', 10011)] This is a Python dictionary containing the count of all the word types in Bown. To find the number of word to- kens in Brown, do: >>> len (bw) To find the number of word types in Brown, just get the length of the dictionary: >>> len(fd) To get the count of the word type 'computer' do: >>> fd['computer'] 13 So there are 13 token of the word type 'computer' in the Brown corpus. This is unigram count for computer'. Jnigram counts are just word counts. Bigram counts will be counts opf word pairs, trigram counts, counts of word riples, and so on. You will find it useful to review Chapter One of the NLTK book. t is handy to ignore case for purposes of this exercise. To do that, just lowercase all the words in Brown before making the frequency distribution. This changes the number of word types, but not the number of word tokens: >>> bw = [w.lower() for win bw] >>> len(bw) >>> fd = nltk. FreqDist(b) >>> len(fd) 49815 To hand in

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

In the Pollard assignment you computed a unigram frequency distribution for the Brown corpus. You will need that for this assignmewnt. This time you will do a bigram distribution: >>> import nltk >>>...

#######run HW2.py##### from __future__ import division import sys from pprint import pprint from collections import defaultdict import nltk from nltk.corpus import treebank from nltk import...

PROBLEM 2 : Vector semantics ( TF - IDF and PPMI vectors ) ( 5 0 points ) 1 . Considering the first 1 0 0 0 sentences of the Brown corpus ( corpus [ 0 : 1 0 0 0 ] ) , regard each sentence as a...

# TF - IDF and PPMI Code Template " " " This template will help you implement TF - IDF and PPMI calculations using the NLTK library and the Brown corpus. You will preprocess the corpus, compute term...

Please answer following questions along with Python code and its result. Make sure to import NLTK. 1. What is the number of noun synsets have no hyponyms? You can get all noun synsets using...

Question 1: What proportion of all word tokens are covered by those top 20 words? Question 1 is the question you must answer to complete this exercise. To answer this question, you will use a...

Please help me with this assignment, 100% human! Reference book George, J. M. (2024). Contemporary management (12th ed.). McGraw-Hill Education. keiser library Syahbinah, S., & Suhardianto, N....

PLEASE ANSWER ME THE QUES 7 AND 8 , THANKS VERY MUCH. NEEDING DETAILED INFORMATION. \freport high account receivable or low current liabilities segregation of duties,double checks, recheck the...

Please read the questions Question: Please explain in your own words, what transformative pedagogy is. Also, describe the ways in which you can include students' cultures and languages in a lesson...

(Questions from Lecture Note and ?Wall Street Journal? Articles) - Discussion Board Activity I need to read the following Lecture Note that I uploaded and some ?Wall Street Journal? articles, and...

20.82 Harrison Ave. 19.55 19.65 19.76 . 21.11 19.87 Design a VCP gravity flow sewer main along Pierce Street to serve the mains from the cross streets starting at the intersection with Madison Avenue...

Fogel Company expects to produce and sell 105,000 units for the period. The company's flexible budget for 105,000 units shows variable overhead costs of $147,000 and fixed overhead costs of $134,000....

19. Suppose the Kalamazoo Brewing Company (KBC) currently sells its microbrews in a seven-state area: Illinois, Indiana, Michigan, Minnesota, Mississippi, Ohio, and Wisconsin. The companys data...

5. Develop a scenario comparing two PH programs and involving the use of a CBA.

8. Explain the relationship between communication and context.

d. How were you expected to contribute to family life?

2. Analyzing Cultural Patterns. Find a text or speech that discusses some intercultural or cultural issues, and analyze the cultural patterns present in the text. Consider, for example, the I Have a...