Question: 1 . Explain intuitively: a . What is the tokenization? b . What is the different between stemming and lemmatization? Give an example! 2 .

1.Explain intuitively:
a.What is the tokenization?
b.What is the different between stemming and lemmatization? Give an example!
2.Let we have corpus: 5low,6newer,3wider,2new.
a.Do subword tokenization using wordpiece tokenization in three iterations.
b.What is the tokenization of word "lower" according to wordpiece?
3.We are given the following corpus, modified from the one in the chapter:
I am Sam
Sam I am
Sam like eggs
I do not like green eggs and Sam
a.Create the bigram counting table and probability table!
b.If we use linear interpolation smoothing between a maximum-likelihood bigram model and a maximum-likelihood unigram model with 1=14and 2=34,what is P(Sam|am)?Include and in your counts just like any other token

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!