Question: In the first part of this assignment you will implement a first-order Markov text generator. Writing this function will involve two functions: (1) one to

In the first part of this assignment you will implement a first-order Markov text generator. Writing this function will involve two functions: (1) one to process a file and create a dictionary of legal word transitions and (2) another to actually generate the new text.

First function to create : createDictionary( filename )

createDictionary( filename ) takes in a string, the name of a text file containing some sample text. It should return a dictionary whose keys are words encountered in the text file and whose entries are a list of words that may legally follow the key word. Note that you should determine a way to keep track of frequency information. That is, if the word "cheese" is followed by the word "pizza" twice as often as it is followed by the word "sandwich", your dictionary should reflect this trend. For example, you might keep multiple copies of a word in the list

The dictionary returned by createDictionary will allow you to choose word t+1 given a word at time t. But how do you choose the first word, when there is no preceding word to use to index into your dictionary?

To handle this case, your dictionary should include the string "$" representing the sentence- start symbol. The first word in the file should follow this string. In addition, each word in the file that follows a sentence-ending word should follow this string. A sentence-ending word will be defined to be any raw, space-separated word whose last character is a period ., a question mark ?, or an exclamation point !

How do I determine if a word ends in a punctuation mark? The easiest way is to check w[-1]. We will only worry about '.', '?', and '!'

Checking your code... To check your code, paste the following text into a plain-text file (for example, into a new file window in Sublime): A B A. A B C. B A C. C C C. Save this file as t.txt in the same directory where you're your_name_project02.py lives. Then, see if your dictionary d matches the sample below: >>> d = createDictionary( 't.txt' ) >>> d {'A': ['B', 'B', 'C.'], 'C': ['C', 'C.'], 'B': ['A.', 'C.', 'A'], '$': ['A', 'A', 'B', 'C']} The elements within each list need not be in the same order, but they should appear in the quantities shown above for each of the four keys, 'A', 'C', 'B', and '$' . Here are the contents of the poptarts file, named a.txt, from class. I like poptarts and 42 and spam. Will I get spam and poptarts for the holidays? I like spam poptarts! You'll want to be sure that the output dictionary from this file is the same as the one in the class notes (note that the order of the keys can vary and they won't be separated line-by-line): >>> d = cd( 'a.txt' ) >>> d {'and': ['42', 'spam.', 'poptarts'], '$': ['I', 'Will', 'I'], 'for': ['the'], 'get': ['spam'], 'I': ['like', 'get', 'like'], 'spam': ['and', 'poptarts!'], '42': ['and'], 'Will': ['I'], 'poptarts': ['and', 'for'], 'the': ['holidays?'], 'like': ['poptarts', 'spam']}

2nd Function to create: generateText( d, n )

generateText( d, n ) will take in a dictionary of word transitions d (generated in your createDictionary function, above) and a positive integer, n. Then, generateText should print a string of n words.

The first word should be randomly chosen from among those that can follow the sentence- starting string "$". Remember that random.choice will choose one item randomly from a list! The second word will be randomly chosen among the list of words that could possible follow the first, and so on... . When a chosen word ends in a period ., a question mark ?, or an exclamation point !, the generateText function should detect this and start a new sentence by again choosing a random word from among those that follow "$".

Don't include the '$' in the output text itself -- it will be a marker internal to your function.

For this problem, you should not strip the punctuation from the raw words of the text file. Leave the punctuation as it appears in the text -- and when you generate words, don't worry if your generated text does not end with legal punctuation, i.e., you might end without a period, which is ok. The text you generate won't be perfect, but you might be surprised how good it is!

Here are two examples that use the dictionary d, from above. Yours will differ because of the randomness, but should be similar in spirit. >>> generateText( d, 20 ) B C. C C C. C C C C C C C C C C C. C C C. A >>> generateText( d, 20 ) A B A. C C C. B A B C. A C. B A. C C C C C C.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!