Question: Part 2 : Deep Averaging Network ( 7 5 points ) In this part, you ll implement a deep averaging network as discussed in lecture
Part : Deep Averaging Network points
In this part, youll implement a deep averaging network as discussed in lecture and in Iyyer et al If
our input s w wn then we use a feedforward neural network for prediction with input
n
Pn
i ewi
where e is a function that maps a word w to its realvalued vector embedding.
Getting started Download the code and data; the data is the same as in Assignment Expand the tgz file
and change into the directory. To confirm everything is working properly, run:
python neuralsentimentclassifier.py model TRIVIAL norunontest
This loads the data, instantiates a TrivialSentimentClassifier that always returns positive
and evaluates it on the training and dev sets. Compared to Assignment this runs an extra word embedding
loading step.
Framework code The framework code you are given consists of several files. neural sentiment classifier.py
is the main class. As before, you cannot modify this file for your final submission, though its okay to add
command line arguments or make changes during development. You should generally not need to modify
the paths. The model argument controls the model specification. The main method loads in the data,
initializes the feature extractor, trains the model, and evaluates it on train, dev, and blind test, and writes the
blind test results to a file.
models.py is the file youll be modifying for this part, and train deep averaging network
is your entry point, similar to Assignment Data reading in sentiment data.py and the utilities in
utils.py are similar to Assignment However, read sentiment examples now lowercases the
dataset; the GloVe embeddings do not distinguish case and only contain embeddings for lowercase words.
sentiment data.py also additionally contains a WordEmbeddings class and code for reading it
from a file. This class wraps a matrix of word vectors and an Indexer in order to index new words. The
Indexer contains two special tokens: PAD index and UNK index UNK can stand in words that arent
in the vocabulary, and PAD is useful for implementing batching later. Both are mapped to the zero vector by
default.
Youll want to use get initialized embedding layer to get a torch.nnEmbedding layer
that can be used in your network. This layer is trainable if you set frozen to False which will be slower
but is initialized with the pretrained embeddings.
Data You are given two sources of pretrained embeddings you can use: datagloveBdrelativized.txt
and datagloveBdrelativized.txt the loading of which is controlled by the word vecs path.
These are trained using GloVe Pennington et al These vectors have been relativized to your data,
meaning that they do not contain embeddings for words that dont occur in the train, dev, or test data. This
is purely a runtime and memory optimization.
Note that the dimensional vectors are used by default. The dimensional vectors will enable your
code to run much faster, particularly if youre not using frozen embeddings, so you may find them useful for
debugging.
PyTorch example ffnn example.py
implements the network discussed in lecture for the synthetic
XOR task. It shows a minimal example of the PyTorch network definition, training, and evaluation loop.
Feel free to refer to this code extensively and to copypaste parts of it into your solution as needed. Most
Available from Exercise b FFNN Handson
of this code is selfdocumenting. The most unintuitive piece is calling zero grad before calling backward! Backward computation uses inplace storage and this must be zeroed out before every gradient
computation.
Implementation Following the example, the rough steps you should take are:
Define a subclass of nnModule that does your prediction. This should return a logprobability
distribution over class labels. Your module should take a list of word indices as input and embed them
using a nnEmbedding layer initialized appropriately.
Compute your classification loss based on the prediction. In lecture, we saw using the negative log
probability of the correct label as the loss. You can do this directly, or you can use a builtin loss
function like NLLLoss or CrossEntropyLoss. Pay close attention to what these losses expect as
inputs probabilities log probabilities, or raw scores
Call network.zero gradzeroes out inplace gradient vectors loss.backward runs the
backward pass to compute gradients and optimizer.step to update your parameters.
Implementation and Debugging Tips Come back to this section as you tackle the assignment!
You should print training loss over your models epochs; this will give you an idea of how the learning
process is proceeding.
You should be able to do the vast majority of your parameter tuning in smallscale experiments. Try to
avoid running large experiments on the whole dataset in order to k
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
