Question: Part 2 : Deep Averaging Network ( 7 5 points ) In this part, you ll implement a deep averaging network as discussed in lecture

Part 2: Deep Averaging Network (75 points)
In this part, youll implement a deep averaging network as discussed in lecture and in Iyyer et al.(2015). If
our input s =(w1,..., wn), then we use a feedforward neural network for prediction with input 1
n
Pn
i=1 e(wi),
where e is a function that maps a word w to its real-valued vector embedding.
Getting started Download the code and data; the data is the same as in Assignment 1. Expand the tgz file
and change into the directory. To confirm everything is working properly, run:
python neural_sentiment_classifier.py --model TRIVIAL --no_run_on_test
This loads the data, instantiates a TrivialSentimentClassifier that always returns 1(positive),
and evaluates it on the training and dev sets. Compared to Assignment 1, this runs an extra word embedding
loading step.
Framework code The framework code you are given consists of several files. neural sentiment classifier.py
is the main class. As before, you cannot modify this file for your final submission, though its okay to add
command line arguments or make changes during development. You should generally not need to modify
the paths. The --model argument controls the model specification. The main method loads in the data,
initializes the feature extractor, trains the model, and evaluates it on train, dev, and blind test, and writes the
blind test results to a file.
models.py is the file youll be modifying for this part, and train deep averaging network
is your entry point, similar to Assignment 1. Data reading in sentiment data.py and the utilities in
utils.py are similar to Assignment 1. However, read sentiment examples now lowercases the
dataset; the GloVe embeddings do not distinguish case and only contain embeddings for lowercase words.
sentiment data.py also additionally contains a WordEmbeddings class and code for reading it
from a file. This class wraps a matrix of word vectors and an Indexer in order to index new words. The
Indexer contains two special tokens: PAD (index 0) and UNK (index 1). UNK can stand in words that arent
in the vocabulary, and PAD is useful for implementing batching later. Both are mapped to the zero vector by
default.
Youll want to use get initialized embedding layer to get a torch.nn.Embedding layer
that can be used in your network. This layer is trainable if you set frozen to False (which will be slower),
but is initialized with the pre-trained embeddings.
Data You are given two sources of pretrained embeddings you can use: data/glove.6B.50d-relativized.txt
and data/glove.6B.300d-relativized.txt, the loading of which is controlled by the --word vecs path.
These are trained using GloVe (Pennington et al.,2014). These vectors have been relativized to your data,
meaning that they do not contain embeddings for words that dont occur in the train, dev, or test data. This
is purely a runtime and memory optimization.
Note that the 300-dimensional vectors are used by default. The 50-dimensional vectors will enable your
code to run much faster, particularly if youre not using frozen embeddings, so you may find them useful for
debugging.
PyTorch example ffnn example.py1
implements the network discussed in lecture for the synthetic
XOR task. It shows a minimal example of the PyTorch network definition, training, and evaluation loop.
Feel free to refer to this code extensively and to copy-paste parts of it into your solution as needed. Most
1Available from Exercise 2.3b FFNN Hands-on.
2
of this code is self-documenting. The most unintuitive piece is calling zero grad before calling backward! Backward computation uses in-place storage and this must be zeroed out before every gradient
computation.
Implementation Following the example, the rough steps you should take are:
1. Define a subclass of nn.Module that does your prediction. This should return a log-probability
distribution over class labels. Your module should take a list of word indices as input and embed them
using a nn.Embedding layer initialized appropriately.
2. Compute your classification loss based on the prediction. In lecture, we saw using the negative log
probability of the correct label as the loss. You can do this directly, or you can use a built-in loss
function like NLLLoss or CrossEntropyLoss. Pay close attention to what these losses expect as
inputs (probabilities, log probabilities, or raw scores).
3. Call network.zero grad()(zeroes out in-place gradient vectors), loss.backward (runs the
backward pass to compute gradients), and optimizer.step to update your parameters.
Implementation and Debugging Tips Come back to this section as you tackle the assignment!
You should print training loss over your models epochs; this will give you an idea of how the learning
process is proceeding.
You should be able to do the vast majority of your parameter tuning in small-scale experiments. Try to
avoid running large experiments on the whole dataset in order to k

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!