Question: Use Jupyter notebok / lab for al codes please. 1 . [ Points 2 0 ] Answer the following questions. Given the training and test

Use Jupyter notebok

/

lab for al codes please.

1 . [

Points

20]

Answer the following questions. Given the training and test examples.

Training Examples:

I love to watch movies

He loves to watch football

They love watching movies

He plays football every Sunday

Test Example:

I love watching football

Text normalization: apply case lowering and may remove any punctuation characters if any.

) [

Points

10]

Show all probability calculations for both unigram and bigram models. Show detailed computations. Please apply add

-

one smoothing techniques.

) [

Points

5]

Calculate the perplexity for both models on the test sentence.

) [

Points

5]

Comment on the difference in perplexity between the unigram and bigram models and explain why one might be lower than the other.

2 . [

points

40]

Given the training documents, classify the testing text as either a "Positive" or "Negative" sentiment by answering the questions below.

Training Text:

1

: "I love this movie"

(

Positive

)

2

: "This movie is great"

(

Positive

)

3

: "I hate this movie"

(

Negative

)

4

: "This movie is terrible"

(

Negative

)

Testing Text:

5

: "I love this great movie"

6

: "I hate this terrible movie"

. [

points

2]

Tokenize the documents by splitting them into words. Perform case lowering and remove punctuation

(

if any

) .

Create the vocabulary for the training document.

. [

points

3]

Compute the prior class probability P

(

),

where C is the class label.

. [

points

15]

Compute the likelihood

/

probability

,

(

|

)

of your given training words using the add

-

one smoothing approach. Show each calculation in detail.

. [

points

10]

Compute the

test

document class probability for each document

(

use log

10

scale to overcome underflow issues

) .

Compare and decide the class label based on your computation.

. [

points

10]

Write a simple code to implement the above steps

(

) (

)

and show the classification results for the given test set. Your computation should show each step computation

(

) (

)

in detail.

3 . [

points

10]

Given the following text documents, answer the below questions.

Document

1

: "I enjoy watching movies on weekends."

Document

2

: "The weather today is sunny and pleasant."

Document

3

"

He plays football every Sunday with his friends."

. [

points

5]

Provide the tokenized version of the text for each document. Apply text normalization steps

-

convert all words to lowercase and remove punctuation if necessary. Show each step in detail. What is the vocabulary size

(

unique words

)

for each document?

. [

points

5]

Generate all context

-

target word pairs for each document using window size W

= 2 .

Explain how the window size W impacts the number of context

-

target pairs generated.

4 . [

points

40]

Given the initial word embedding from the question

3 .

"watching":

[0.3, 0.1, 0.2] [0.3, 0.1, 0.2]

"movies":

[0.2, 0.4, 0.6] [0.2, 0.4, 0.6]

"sunny":

[0.7, 0.1, 0.4] [0.7, 0.1, 0.4]

"football":

[0.5, 0.3, 0.2] [0.5, 0.3, 0.2]

"friends":

[0.6, 0.3, 0.1] [0.6, 0.3, 0.1]

Instructions: Follow the steps when applicable to : i

)

Simulate the dot product between context and target vectors. ii

)

Apply gradient descent to update the word vectors

(

assume a learning rate of

0.01) .

iii

)

Perform one iteration of the embedding update.

) [

points

10]

Show the word embedding updates after one iteration for the word "movies" when the context word is "watching." Show detailed computations in each step. Explain how the dot product helps capture word similarity during the training process.

) [

points

15]

Assume we are performing negative sampling for the word pair

("

movies

",

"watching"

) .

Randomly sample three negative words from the vocabulary: "sunny", "football", "friends".

Compute the dot product between "movies" and its negative samples. Show detailed computations. Explain the purpose of negative sampling and how it improves the efficiency of training Word

2

Vec models.

) [

points

15]

Calculate the cosine similarity between "movies" and "watching" using their updated embeddings from the previous question. Based on the cosine similarity result, explain whether these words are semantically close or not. What threshold would you consider when deciding if two words are similar?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Please read chapter 6 and answer the questions and see the ( guide to answer number 3) 1. Decide what assessment you would like to do to provide you with more information about the student, 2....

code in c++ A game consists of a sequence of points played with the same player serving and is won by the first side to have won at least four points with a margin of two points or more over their...

PLEASE HELP ME CODE THIS PROGRAM. ALL I ASK FOR ARE SOME EXAMPLES OF HOW TO GO ABOUT DOING THIS AS YOU WILL SEE IN MY BOLD & ITALIC MESSAGES THROUGHOUT THIS PROBLEM. I HAVE INCLUDED CODE FOR YOU TO...

i need help from section 3 part A I have already posted this and you asked me to post under advanced option Assignment Tier 1 Financial Planning (Tier1FPPA_AS_v1A3) Page 2 of 92 Student...

Im having trouble with question 3 part A of the attached file. I know and understand her needs but im not sure if im filling out the table correctly ? Assignment Tier 1 Financial Planning...

Read the case study and answer the questions given below Teaching Smart People How to Learn by Chris Argyris From the Magazine (May-June 1991 Any company that aspires to succeed in the tougher...

Please answer all the questions:- Medicinal value of plants. Sea buckthorn (Hippophae), a plant that typically grows at high altitudes in Europe and Asia, has been found to have medicinal value. The...

1) In a process facility, you have two liquid streams with only methanol and water coming in from two separate processes. Stream 1 contains 80% of methanol by moles and Stream 2 contains 30% of...

The following transactions occurred during the year: a. Common stock was issued in exchange for a new heavy truck. b. A cash dividend of $20,000 was paid. c. A 90-day note payable was issued for...

EVA stands for: Economic Value of Assets Economic Value Approach Economic Value Added Enhanced Value of Assets

what year did the baldridge award introduce the concept of organizational sustainability