Part 2 Deep Averaging Network ( 7 5 points ) In this part, you ll implement a deep averaging network as discussed in lecture and in Iyyer et al ( 2 0 1 5 ) If our input s ( w 1 , , wn ) , then we use a feedforward neural network for prediction with input 1 n Pn i 1 e ( wi ) , where e is a function that maps a word w to its real valued vector embedding Getting started Download the code and data the data is the same as in Assignment 1 Expand the tgz file and change into the directory To confirm everything is working properly, run python neural sentiment classifier py model TRIVIAL no run on test This loads the data, instantiates a TrivialSentimentClassifier that always returns 1 ( positive ) , and evaluates it on the training and dev sets Compared to Assignment 1 , this runs an extra word embedding loading step Framework code The framework code you are given consists of several files neural sentiment classifier py is the main class As before, you cannot modify this file for your final submission, though it s okay to add command line arguments or make changes during development You should generally not need to modify the paths The model argument controls the model specification The main method loads in the data, initializes the feature extractor, trains the model, and evaluates it on train, dev, and blind test, and writes the blind test results to a file models py is the file you ll be modifying for this part, and train deep averaging network is your entry point, similar to Assignment 1 Data reading in sentiment data py and the utilities in utils py are similar to Assignment 1 However, read sentiment examples now lowercases the dataset the GloVe embeddings do not distinguish case and only contain embeddings for lowercase words sentiment data py also additionally contains a WordEmbeddings class and code for reading it from a file This class wraps a matrix of word vectors and an Indexer in order to index new words The Indexer contains two special tokens PAD ( index 0 ) and UNK ( index 1 ) UNK can stand in words that aren t in the vocabulary, and PAD is useful for implementing batching later Both are mapped to the zero vector by default You ll want to use get initialized embedding layer to get a torch nn Embedding layer that can be used in your network This layer is trainable if you set frozen to False ( which will be slower ) , but is initialized with the pre trained embeddings Data You are given two sources of pretrained embeddings you can use data glove 6 B 5 0 d relativized txt and data glove 6 B 3 0 0 d relativized txt , the loading of which is controlled by the word vecs path These are trained using GloVe ( Pennington et al , 2 0 1 4 ) These vectors have been relativized to your data, meaning that they do not contain embeddings for words that don t occur in the train, dev, or test data This is purely a runtime and memory optimization Note that the 3 0 0 dimensional vectors are used by default The 5 0 dimensional vectors will enable your code to run much faster, particularly if you re not using frozen embeddings, so you may find them useful for debugging PyTorch example ffnn example py 1 implements the network discussed in lecture for the synthetic XOR task It shows a minimal example of the PyTorch network definition, training, and evaluation loop Feel free to refer to this code extensively and to copy paste parts of it into your solution as needed Most 1 Available from Exercise 2 3 b FFNN Hands on 2 of this code is self documenting The most unintuitive piece is calling zero grad before calling backward Backward computation uses in place storage and this must be zeroed out before every gradient computation Implementation Following the example, the rough steps you should take are 1 Define a subclass of nn Module that does your prediction This should return a log probability distribution over class labels Your module should take a list of word indices as input and embed them using a nn Embedding layer initialized appropriately 2 Compute your classification loss based on the prediction In lecture, we saw using the negative log probability of the correct label as the loss You can do this directly, or you can use a built in loss function like NLLLoss or CrossEntropyLoss Pay close attention to what these losses expect as inputs ( probabilities , log probabilities, or raw scores ) 3 Call network zero grad ( ) ( zeroes out in place gradient vectors ) , loss backward ( runs the backward pass to compute gradients ) , and optimizer step to update your parameters Implementation and Debugging Tips Come back to this section as you tackle the assignment You should print training loss over your models epochs this will give you an idea of how the learning process is proceeding You should be able to do the vast majority of your parameter tuning in small scale experiments Try to avoid running large experiments on the whole dataset in order to k

The Answer is in the image, click to view ...

Question: Part 2 : Deep Averaging Network ( 7 5 points ) In this part, you ll implement a deep averaging network as discussed in lecture

Part

2

: Deep Averaging Network

(75

points

)

In this part, you

ll implement a deep averaging network as discussed in lecture and in Iyyer et al

. (2015) .

our input s

= (

1, . . .,

),

then we use a feedforward neural network for prediction with input

1

= 1

(

),

where e is a function that maps a word w to its real

-

valued vector embedding.

Getting started Download the code and data; the data is the same as in Assignment

1 .

Expand the tgz file

and change into the directory. To confirm everything is working properly, run:

python neural

_

sentiment

_

classifier.py

- -

model TRIVIAL

- -

_

run

_

_

test

This loads the data, instantiates a TrivialSentimentClassifier that always returns

1 (

positive

),

and evaluates it on the training and dev sets. Compared to Assignment

1,

this runs an extra word embedding

loading step.

Framework code The framework code you are given consists of several files. neural sentiment classifier.py

is the main class. As before, you cannot modify this file for your final submission, though it

s okay to add

command line arguments or make changes during development. You should generally not need to modify

the paths. The

- -

model argument controls the model specification. The main method loads in the data,

initializes the feature extractor, trains the model, and evaluates it on train, dev, and blind test, and writes the

blind test results to a file.

models.py is the file you

ll be modifying for this part, and train deep averaging network

is your entry point, similar to Assignment

1 .

Data reading in sentiment data.py and the utilities in

utils.py are similar to Assignment

1 .

However, read sentiment examples now lowercases the

dataset; the GloVe embeddings do not distinguish case and only contain embeddings for lowercase words.

sentiment data.py also additionally contains a WordEmbeddings class and code for reading it

from a file. This class wraps a matrix of word vectors and an Indexer in order to index new words. The

Indexer contains two special tokens: PAD

(

index

0)

and UNK

(

index

1) .

UNK can stand in words that aren

in the vocabulary, and PAD is useful for implementing batching later. Both are mapped to the zero vector by

default.

You

ll want to use get initialized embedding layer to get a torch.nn

.

Embedding layer

that can be used in your network. This layer is trainable if you set frozen to False

(

which will be slower

),

but is initialized with the pre

-

trained embeddings.

Data You are given two sources of pretrained embeddings you can use: data

/

glove

. 6

. 50

-

relativized.txt

and data

/

glove

. 6

. 300

-

relativized.txt

,

the loading of which is controlled by the

- -

word vecs path.

These are trained using GloVe

(

Pennington et al

., 2014) .

These vectors have been relativized to your data,

meaning that they do not contain embeddings for words that don

t occur in the train, dev, or test data. This

is purely a runtime and memory optimization.

Note that the

300 -

dimensional vectors are used by default. The

50 -

dimensional vectors will enable your

code to run much faster, particularly if you

re not using frozen embeddings, so you may find them useful for

debugging.

PyTorch example ffnn example.py

1

implements the network discussed in lecture for the synthetic

XOR task. It shows a minimal example of the PyTorch network definition, training, and evaluation loop.

Feel free to refer to this code extensively and to copy

-

paste parts of it into your solution as needed. Most

1

Available from Exercise

2.3

FFNN Hands

-

.

2

of this code is self

-

documenting. The most unintuitive piece is calling zero grad before calling backward! Backward computation uses in

-

place storage and this must be zeroed out before every gradient

computation.

Implementation Following the example, the rough steps you should take are:

1 .

Define a subclass of nn

.

Module that does your prediction. This should return a log

-

probability

distribution over class labels. Your module should take a list of word indices as input and embed them

using a nn

.

Embedding layer initialized appropriately.

2 .

Compute your classification loss based on the prediction. In lecture, we saw using the negative log

probability of the correct label as the loss. You can do this directly, or you can use a built

-

in loss

function like NLLLoss or CrossEntropyLoss. Pay close attention to what these losses expect as

inputs

(

probabilities

,

log probabilities, or raw scores

) .

3 .

Call network.zero grad

() (

zeroes out in

-

place gradient vectors

),

loss.backward

(

runs the

backward pass to compute gradients

),

and optimizer.step to update your parameters.

Implementation and Debugging Tips Come back to this section as you tackle the assignment!

You should print training loss over your models

epochs; this will give you an idea of how the learning

process is proceeding.

You should be able to do the vast majority of your parameter tuning in small

-

scale experiments. Try to

avoid running large experiments on the whole dataset in order to k

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

subject: Differential Equations pls read instructions do not use ai. drop all references and link Instructions ODE application. - find an article related to ODE application - provide a short...

Journal of Open Innovation: Technology, Market, and Complexity MDPI Article Emerging Technology and Business Model Innovation: The Case of Artificial Intelligence Jaehun Lee 1.", Taewon Suh , Daniel...

Write a literature review for your study. See below for an example of a literature review. Your literature review should provide both analysis and synthesis of previous studies as related to the...

BACKGROUND An SM bond is a 35-year Australian government bond with a face value of $1. They are marketed to consumers saving for retirement (who buy the S part) and investors (who buy the M part)....

Please Provide a 2 to 3 page article review for each, there attached on this question. Your discussion should include a brief summary (including the big ideas or major points of the article) along...

informs Vol. 34, No. 3, May-June 2004, pp. 191-205 issn 0092-2102 \u0001 eissn 1526-551X \u0001 04 \u0001 3403 \u0001 0191 doi 10.1287/inte.1030.0068 2004 INFORMS Inventory Decisions in Dell's Supply...

Rieg, Robert, Zarzycka, Ewelina, & Dobroszek, Justyna. (2021). Determinants of separating management accounting from financial accounting in SMEs and Family Firms - evidence from Poland and Germany....

Performance Appraisal: Measurement, Assessment, and Management Chapter 7 Radius Images/Getty Images Learning Objectives After reading this chapter, you should be able to do the following: Use a...

PAPERS What Project Strategy Really Is: The Fundamental Building Block in Strategic Project ManagementPeerasit Patanakul, Stevens Institute of Technology, Hoboken, NJ, USA Aaron J. Shenhar, Rutgers...

Hi. I am going to do research project in Managemeerial accounting. What i need is to come up with good research question from one of these two articles below. The research project is going to be very...

1. How long does it take for a radio wave to travel 3000 mi across the United States? 2. How long does it take for a flash of light to travel 100 m? 3. How long does it take for a police radar beam...

Repeat Exercise 20.1, this time plotting the values of P (D m+1 = limeh MAP) and P (D m+1 = limehML).

Please explain the solution to this general accounting problem with accurate principles. Larkspur Inc. has daily sales averaging $95,000. The company is considering using a lockbox service that would...

Calculate the inventory turnover ratio for a company with cost of goods sold of $200,000 and average inventory of $40,000. Interpret the ratio and discuss its significance in assessing inventory...