Question: corpus _ for _ language _ models. t x t : Richard W . Lock , retired vice president and treasurer of Owens - Illinois

corpus_for_language_models.txt:
Richard W. Lock , retired vice president and treasurer of Owens-Illinois Inc. , was named a director of this transportation industry supplier , increasing its board to six members .
John J. Phelan Jr., chairman of the Big Board , asserts that ``1988 and 1989 have been two of the least volatile years in the last 30 or 40 years .''
In January , American Medical brought in a new chief executive officer , Richard A. Gilleland ,45, who will remain as chairman , president and chief executive .
Ralph T. Linsley , vice chairman of Eagle , will become vice chairman of Webster\/Eagle .
Orkem declined to give details of its offer , saying only that the bid will be submitted for approval by the board of the British company .
But in 1935, when Congress was trying to find someone or something to blame for the Great Depression , it decided to drop both the secretary and the comptroller from the board .
Treasury Undersecretary David Mulford , for instance , was at a meeting of the Business Council in Hot Springs , Va., when the stock market fell , and remained there through the following day .
He will remain chairman .
The memo attempts to remove the tourist board as far as possible from the agency , which pleaded innocent to the charges .
J. Michael Cook , chairman of Deloitte , Haskins & Sells International , said he believes the legal action by the British firm `` to be without merit .''
Other conservatives thought to be on the administration 's short list include Washington lawyer Michael Uhlmann , who was passed over for the No.2 job at the Justice Department , and Marshall Breger , chairman of a U.S. agency on administration .
The new products allow customers to add Convex machines to established systems made by other manufacturers , which `` opens up a phenomenal market for us ,'' said Robert J. Paluck , Convex 's chairman , president and chief executive .
Wilson H. Taylor , president and chief executive officer of this insurance and financial services concern , was elected to the additional post of chairman .
The Fed chairman 's caution was apparent again on the Monday morning after the market 's plunge , when the central bank took only modest steps to aid the markets .
Cipher Data said Mr. Marinaro consequently has resigned from those posts and from the company 's board .
Most notably , several of the regulatory steps recommended by the Brady Task Force , which analyzed the 1987 crash , would be revived -- especially because that group 's chairman is now the Treasury secretary .
Mr. Gaubert , who was chairman and the majority stockholder of Independent American , had relinquished his control in exchange for federal regulators ' agreement to drop their inquiry into his activities at another savings and loan .
And Bill Konopnicki , a Safford , Ariz. , licensee of McDonald 's Corp. who is chairman of the company 's Nationa
QUESTION 4
Trigram Language Models
Compute the probability of the following two sentences:
S1: `` We wanted them to build a road here ,'' he says .
S2: Edward L. Kane succeeded Mr. Taylor as chairman .
Build a trigram language model trained on the corpus that is provided on eLearning:
corpus_for_language_models.txt
Treat every line in the corpus file as a sentence.
Hint: Intialize a list and open the file in the read mode. For each line in the file, strip it and append it to the list after tokenizing.
Use the NLTK library to tokenize the sentences:
from nltk.tokenize import wordpunct_tokenize
sentence = wordpunct_tokenize(line)
Also use the NLTK library to build the inputs for the trigram model:
from nltk.lm.preprocessing import padded_everygram_pipeline
train, vocab = padded_everygram_pipeline(3, sentences)
where sentences is a list of sentences, and each sentence is a list of tokens. These are the sentences you read from the corpus file.
Find out which of the two sentences is more probable by computing the log probability each
sentence with the following different methods. Be sure to use NLTK's logscore, which uses
log base 2 to compute log probabilities.
(A) Maximum Likelihood Estimation (MLE)
Use the trigram model without smoothing, with the following implementation from NLTK:
from nltk.lm import MLE
Compute the log probability, rounded to the nearest integer, of the two sentences using the MLE model.
For example, if the log probability is -3.14159, then the answer would be -3
(1) The log probability of sentence S1:
(2) The log probability of sentence S2:
(B) Add-One (Laplace) Smoothing
Use the trigram model with add-one (Laplace) smoothing, with the following implementation from NLTK:
from nltk.lm import Laplace
Compute the log probability, rounded to the nearest integer, of the two sentences using the Laplace model.
For example, if the log probability is -3.14159, then the answer would be -3
(1) The log probability of sentence S1:
(2) The log probability of sentence S2:

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!