Question: corpus _ for _ language _ models. t x t : Richard W . Lock , retired vice president and treasurer of Owens - Illinois
corpusforlanguagemodels.:
Richard W Lock retired vice president and treasurer of OwensIllinois Inc. was named a director of this transportation industry supplier increasing its board to six members
John J Phelan Jr chairman of the Big Board asserts that and have been two of the least volatile years in the last or years
In January American Medical brought in a new chief executive officer Richard A Gilleland who will remain as chairman president and chief executive
Ralph T Linsley vice chairman of Eagle will become vice chairman of WebsterEagle
Orkem declined to give details of its offer saying only that the bid will be submitted for approval by the board of the British company
But in when Congress was trying to find someone or something to blame for the Great Depression it decided to drop both the secretary and the comptroller from the board
Treasury Undersecretary David Mulford for instance was at a meeting of the Business Council in Hot Springs Va when the stock market fell and remained there through the following day
He will remain chairman
The memo attempts to remove the tourist board as far as possible from the agency which pleaded innocent to the charges
J Michael Cook chairman of Deloitte Haskins & Sells International said he believes the legal action by the British firm to be without merit
Other conservatives thought to be on the administration s short list include Washington lawyer Michael Uhlmann who was passed over for the No job at the Justice Department and Marshall Breger chairman of a US agency on administration
The new products allow customers to add Convex machines to established systems made by other manufacturers which opens up a phenomenal market for us said Robert J Paluck Convex s chairman president and chief executive
Wilson H Taylor president and chief executive officer of this insurance and financial services concern was elected to the additional post of chairman
The Fed chairman s caution was apparent again on the Monday morning after the market s plunge when the central bank took only modest steps to aid the markets
Cipher Data said Mr Marinaro consequently has resigned from those posts and from the company s board
Most notably several of the regulatory steps recommended by the Brady Task Force which analyzed the crash would be revived especially because that group s chairman is now the Treasury secretary
Mr Gaubert who was chairman and the majority stockholder of Independent American had relinquished his control in exchange for federal regulators agreement to drop their inquiry into his activities at another savings and loan
And Bill Konopnicki a Safford Ariz. licensee of McDonald s Corp. who is chairman of the company s Nationa
QUESTION
Trigram Language Models
Compute the probability of the following two sentences:
S: We wanted them to build a road here he says
S: Edward L Kane succeeded Mr Taylor as chairman
Build a trigram language model trained on the corpus that is provided on eLearning:
corpusforlanguagemodels.txt
Treat every line in the corpus file as a sentence.
Hint: Intialize a list and open the file in the read mode. For each line in the file, strip it and append it to the list after tokenizing.
Use the NLTK library to tokenize the sentences:
from nltktokenize import wordpuncttokenize
sentence wordpuncttokenizeline
Also use the NLTK library to build the inputs for the trigram model:
from nltklmpreprocessing import paddedeverygrampipeline
train, vocab paddedeverygrampipeline sentences
where sentences is a list of sentences, and each sentence is a list of tokens. These are the sentences you read from the corpus file.
Find out which of the two sentences is more probable by computing the log probability each
sentence with the following different methods. Be sure to use NLTKs logscore which uses
log base to compute log probabilities.
A Maximum Likelihood Estimation MLE
Use the trigram model without smoothing, with the following implementation from NLTK:
from nltklm import MLE
Compute the log probability, rounded to the nearest integer, of the two sentences using the MLE model.
For example, if the log probability is then the answer would be
The log probability of sentence S:
The log probability of sentence S:
B AddOne Laplace Smoothing
Use the trigram model with addone Laplace smoothing, with the following implementation from NLTK:
from nltklm import Laplace
Compute the log probability, rounded to the nearest integer, of the two sentences using the Laplace model.
For example, if the log probability is then the answer would be
The log probability of sentence S:
The log probability of sentence S:
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
