Question: from math import inf, log from typing import List, Optional from utils import max_word_length, is_valid, word_prob # Part B: Probabilistic reconstruction def likely_reconstruct(document: str) ->

 from math import inf, log from typing import List, Optional from

from math import inf, log from typing import List, Optional from utils import max_word_length, is_valid, word_prob # Part B: Probabilistic reconstruction def likely_reconstruct(document: str) -> Optional[str]: """ Finds the **most likely** reconstruction of a string with no whitespace. :param document: A nonempty string of letters, stripped of all whitespace and punctuation. :return: A string which is the most likely reconstruction of the input, or None if all reconstructions have zero probability. """ if len(document.split()) > 1: raise ValueError('Document must not contain any whitespace.') # todo

Given a string [1..n] with no whitespace, your goal is to reconstruct the string by splitting it into valid words separated by spaces (if possible). For example, "it was the best of times" is a reconstruction of "itwasthebestoftimes". Words may also contain punctuation: "I'llhavewhatshe'shaving" can be reconstructed as "I'll have what she's having" . Some strings cannot be reconstructed, such as "qwertyuiop". Install dependencies We need an English dictionary, so install the wordfreq library by running pip install wordfreq (Windows) or pip install wordfreq (Mac/Linux). Part B There is actually a fairly easy way to improve the output of this naive algorithm. Rather than settling for any reconstruction of the string, we can look for the most likely reconstruction. To do this, we assume that each word w is picked independently with probability P/w). The probability of some reconstruction is the product of the probabilities of the individual words. For example, the probability of the sentence "This is good" is equal to P("This") * P("is") * P("good"). We compute the reconstruction with the maximum probability. Implement this strategy in the likely_reconstruct function in hw3.py. Once again, your algorithm should run in O(nk) time. Use the provided word_prob function to compute a word's probability, as shown below. It will ignore whitespace and leading / trailing punctuation. >>> word_prob("the") 8.588843655955589 >>> word_prob ("end.'") a.ee24897788193684461 >>> word_prob("zxcvbnm") a. @ >>> word_prob ("not a ward") Traceback (most recent call last): ValueError: Invalid argument: not a word Words must not contain whitespace Words with zero probability are considered invalid (not in the dictionary). Multiplying probabilities is not a great idea, because it leads to underflow: >>> .Bee ** 60 1.0000000000000048e-300 >>> (.ee281 ** 60 * (.00003 ** 10) >>> from sys import float_info >>> float_info.min 2.2258738585072014e-288 To avoid this, you can use the following trick: >>> from math import log >>>> -log(.ee001 ** 60 * .00003 ** 10) Traceback (most recent call last): File " ValueError: math domain error > > > -log(.e0001 ** ) 699.7755278982137 >>> -log(.e0083 ** 10) 184.14313176382119 >>> -log(.60081 ** 60) + -log(.ee283 ** 10) 794.9186596612349 Note that this function increases in value as the probability of an event decreases. The most likely reconstruction is the one which minimizes this cost function. Given a string [1..n] with no whitespace, your goal is to reconstruct the string by splitting it into valid words separated by spaces (if possible). For example, "it was the best of times" is a reconstruction of "itwasthebestoftimes". Words may also contain punctuation: "I'llhavewhatshe'shaving" can be reconstructed as "I'll have what she's having" . Some strings cannot be reconstructed, such as "qwertyuiop". Install dependencies We need an English dictionary, so install the wordfreq library by running pip install wordfreq (Windows) or pip install wordfreq (Mac/Linux). Part B There is actually a fairly easy way to improve the output of this naive algorithm. Rather than settling for any reconstruction of the string, we can look for the most likely reconstruction. To do this, we assume that each word w is picked independently with probability P/w). The probability of some reconstruction is the product of the probabilities of the individual words. For example, the probability of the sentence "This is good" is equal to P("This") * P("is") * P("good"). We compute the reconstruction with the maximum probability. Implement this strategy in the likely_reconstruct function in hw3.py. Once again, your algorithm should run in O(nk) time. Use the provided word_prob function to compute a word's probability, as shown below. It will ignore whitespace and leading / trailing punctuation. >>> word_prob("the") 8.588843655955589 >>> word_prob ("end.'") a.ee24897788193684461 >>> word_prob("zxcvbnm") a. @ >>> word_prob ("not a ward") Traceback (most recent call last): ValueError: Invalid argument: not a word Words must not contain whitespace Words with zero probability are considered invalid (not in the dictionary). Multiplying probabilities is not a great idea, because it leads to underflow: >>> .Bee ** 60 1.0000000000000048e-300 >>> (.ee281 ** 60 * (.00003 ** 10) >>> from sys import float_info >>> float_info.min 2.2258738585072014e-288 To avoid this, you can use the following trick: >>> from math import log >>>> -log(.ee001 ** 60 * .00003 ** 10) Traceback (most recent call last): File " ValueError: math domain error > > > -log(.e0001 ** ) 699.7755278982137 >>> -log(.e0083 ** 10) 184.14313176382119 >>> -log(.60081 ** 60) + -log(.ee283 ** 10) 794.9186596612349 Note that this function increases in value as the probability of an event decreases. The most likely reconstruction is the one which minimizes this cost function

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!