Question: Please help with the problwm in python Example: ROMEO_SOLILOQUY = But, soft! what light through yonder window breaks? It is the east, and Juliet

Please help with the problwm in python

Example:

ROMEO_SOLILOQUY = """ But, soft! what light through yonder window breaks? It is the east, and Juliet is the sun. Arise, fair sun, and kill the envious moon, who is already sick and pale with grief, That thou her maid art far more fair than she: be not her maid, since she is envious; her vestal livery is but sick and green and none but fools do wear it; cast it off. It is my lady, O, it is my love! O, that she knew she were! She speaks yet she says nothing: what of that? Her eye discourses; I will answer it. I am too bold, 'tis not to me she speaks: two of the fairest stars in all the heaven, having some business, do entreat her eyes to twinkle in their spheres till they return. What if her eyes were there, they in her head? The brightness of her cheek would shame those stars, as daylight doth a lamp; her eyes in heaven would through the airy region stream so bright that birds would sing and think it were not night. See, how she leans her cheek upon her hand! O, that I were a glove upon that hand, that I might touch that cheek!"""

Using the string's built-in split method --- previously mentioned in class --- along with lower, we can derive from the passage a list of tokens.

In [ ]:

toks = [t.lower() for t in ROMEO_SOLILOQUY.split()]

toks[:8]

compute_ngrams

First task is to write compute_ngrams, which will take a list of tokens, a value n indicating the n-gram length (e.g., 3 for 3-grams), and return an n-gram dictionary. The keys in the returned dictionary should all be strings, whose values will be lists of one or more tuples. Note that even in the case of n=2 (which would be the minimum value) the dictionary should map strings to lists of 1-tuples (i.e., instead of to lists of individual tokens).

In [ ]:

def compute_ngrams(toks, n=2):

 """Returns an n-gram dictionary based on the provided list of tokens."""

 # YOUR CODE HERE

 raise NotImplementedError()

And now for some simple tests:

In [ ]:

# (5 points)

from unittest import TestCase

tc = TestCase()

simple_toks = [t.lower() for t in 'I really really like cake.'.split()]

compute_ngrams(simple_toks)

tc.assertEqual(compute_ngrams(simple_toks),

 {'i': [('really',)], 'like': [('cake.',)], 'really': [('really',), ('like',)]})

tc.assertEqual(compute_ngrams(simple_toks, n=3),

 {'i': [('really', 'really')],

 'really': [('really', 'like'), ('like', 'cake.')]})

romeo_toks = [t.lower() for t in ROMEO_SOLILOQUY.split()]

dct = compute_ngrams(romeo_toks, n=4)

tc.assertEqual(dct['but'], [('sick', 'and', 'green'), ('fools', 'do', 'wear')])

tc.assertEqual(dct['it'],

 [('is', 'the', 'east,'),

 ('off.', 'it', 'is'),

 ('is', 'my', 'lady,'),

 ('is', 'my', 'love!'),

 ('were', 'not', 'night.')])

I've also placed the entire text of Peter Pan (courtesy of Project Gutenberg) on the server, to be used to stress test your function just a bit. Evaluate the following cell to read the text of the book into peter_pan_text.

If you're not on the course server, you can uncomment the line to read the text directly from the Project Gutenberg website and comment out the lines which access the file for testing.

In [ ]:

import urllib.request

PETER_PAN_FILENAME = '/srv/cs331/peterpan.txt'

PETER_PAN_URL = 'https://www.gutenberg.org/files/16/16-0.txt'

# if you're not on the course server, uncomment the line below to read the text over the web

# peter_pan_text = urllib.request.urlopen(PETER_PAN_URL).read().decode()

# if you uncommented the line above, comment out the two lines below

with open(PETER_PAN_FILENAME) as infile:

 peter_pan_text = infile.read()

chapt1_start = peter_pan_text.index('All children')

print(peter_pan_text[chapt1_start:chapt1_start+1000])

Time for some larger test cases!

In [ ]:

# (5 points)

from unittest import TestCase

tc = TestCase()

pp_toks = [t.lower() for t in peter_pan_text.split()]

dct = compute_ngrams(pp_toks, n=3)

tc.assertEqual(dct['crocodile'],

 [('passes,', 'but'),

 ('that', 'happened'),

 ('would', 'have'),

 ('was', 'in'),

 ('passed', 'him,'),

 ('is', 'about'),

 ('climbing', 'it.'),

 ('that', 'was'),

 ('pass', 'by'),

 ('and', 'let'),

 ('was', 'among'),

 ('was', 'waiting')])

tc.assertEqual(len(dct['wendy']), 202)

tc.assertEqual(len(dct['peter']), 243)

Random selection

One more thing before you start work on generating passages from an n-gram dictionary: we need a way to choose a random item from a sequence.

The random.choice function provides just this functionality. Consider (and feel free to play with) the following examples --- you should, at the very least, evaluate the cell a few separate times to see the results:

In [ ]:

import random

print(random.choice(['lions', 'tigers', 'bears']))

print(random.choice(range(100)))

print(random.choice([('really', 'like'), ('like', 'cake')]))

gen_passage

Finally, you're ready to implement gen_passage, which will take an n-gram dictionary and a length for the passage to generate (as a token count).

As described earlier, it will work as follows:

Select a random key from the dictionary and use it as the start token of the passage. It will also serve as the current token for the next step.

Select a random tuple from the list associated with the current token and append the sequence to the passage. The last token of the selected sequence will be the new current token.

If the current token is a key in the dictionary then simply repeat step 2, otherwise select another random key from the map as the current token and append it to the passage before repeating step 2.

You will use random.choice whenever a random selection needs to be made. In order for your results to be reproduceable, be sure to sort the dictionary's keys (which, recall, are in no discernible order) before selecting a random one, like this (assuming ngram_dict is the dictionary):

random.choice(sorted(ngram_dict.keys()))

In [ ]:

# ( 5 points)

def gen_passage(ngram_dict, length=100):

 # YOUR CODE HERE

 raise NotImplementedError()

For the following test cases to work, it is critical that you do not invoke random.choice more than is absolutely necessary, and only as prescribed in the steps described above!

Note that in addition to the automated test cases, we'll also be manually grading your code above.

In [ ]:

# (5 points)

tc = TestCase()

random.seed(1234)

simple_toks = [t.lower() for t in 'I really really like cake.'.split()]

tc.assertEqual(gen_passage(compute_ngrams(simple_toks), 10),

 'like cake. i really really really really like cake. i')

random.seed(1234)

romeo_toks = [t.lower() for t in ROMEO_SOLILOQUY.split()]

tc.assertEqual(gen_passage(compute_ngrams(romeo_toks), 10),

 'too bold, \'tis not night. see, how she leans her')

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Overview 1 Exit Full An n-gram -- in the context of parsing natural languages such as English -- is a sequence of n consecutive tokens (which we might define as characters separated by whitespace)...

Can someone help me with this PYTHON lab. Please answer in code! An *n-gram* -- in the context of parsing natural languages such as English -- is a sequence of *n* consecutive *tokens* (which we...

Needs to be done with PYTHON N-grams Overview An n-gram -- in the context of parsing natural languages such as English -- is a sequence of n consecutive tokens (which we might define as characters...

N-GRAM problem (coding in python3) An n-gram -- in the context of parsing natural languages such as English -- is a sequence of n consecutive tokens (which we might define as characters separated by...

Overview Exit Full An n-gram -- in the context of parsing natural languages such as English -- is a sequence of n consecutive tokens (which we might define as characters separated by whitespace) from...

Program 3 should first tell users that this is a word analysis software. For any user - given text file, the program will read, analyze, and write each word with the line numbers where the word is...

Here is Romeo.txt Romeo and Juliet Act 2, Scene 2 SCENE II. Capulet's orchard. Enter ROMEO ROMEO He jests at scars that never felt a wound. JULIET appears above at a window But, soft! what light...

Python program to parse the string, by separate the lines according to who said them: Romeo, Juliet, etc. Use the transcript format. Once you see one of the speakers tags 'ROMEO' all lines/words...

I'm pretty confused on this whole question, and where it even begins Implementation Details Before you start working on the aforementioned functions, it's important to consider how we'll be parsing...

(e) Determine Ypus for the 3-bus system system shown in Fig. 1(e). The line series impedances are as |1] follows : Line (bus to bus) Impedance (p.u.) 1-2 0.04 + j0.18 0.02 + j0.09 0.06 + j0.24 1-3...

A machine costs $25,000. It has a 2-year life and a 12% discount rate correctly reflects its risk. Its performance depends upon the state of the world: State Year1 Year2 Probability I +20000 +15000...

Under lapse - supported pricing, a persistency bonus is paid to

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

Explain the purpose of the Project Charter and its relationship to Management Approval for a Project.

What is Change Control and how does it operate?

How do Data Requirements relate to Functional Requirements?