Question: Part 4 - Training loop and running training ( 0 pts - - already built, but an important step ) Next, consider the function train

Part 4- Training loop and running training (0 pts -- already built, but an important step)
Next, consider the function train(model, loader), as well as the main section in train_model.py. The train function will train the model for a single epoch. It has already been completed for you.
The function first defines the loss function, which isLinks to an external site. CrossEntropyLossLinks to an external site.. Then, it specifies the optimizer algorithm used for training. In class, we only discussed stochastic gradient descent. Here, we are using AdagradLinks to an external site. with a learning rate of 0.01.
The main loop iterates over the items obtained by the DataLoader. For each batch, it obtains the inputs (a (batch_size, 6) tensor) and the targets (a (batch_size, 91) tensor). It then calls the model to obtain the predictions for the inputs and computes the loss. Finally it performs the backward pass as follows:
optimizer.zero_grad() # set the gradient for all parameters to 0
loss.backward() # recompute gradients based on current loss
optimizer.step() # update the parameters based on the error gradients
The method also computes various statistics: it reports the training loss after each 1000 batches, as well as the accuracy after the epoch.
The main section at the bottom of train_model.py then runs the actual training. It first instantiates the model (passing the number of word types and number of output labels). Then, it loads the training data set, and wraps it in a DataLoader with a batch size of 16.
Then, it runs five epochs of training. In my experiments, after 5 epochs I reached a training loss of <0.31 and a training accuracy of about 0.90.
Finally, the trained model parameters are saved to disk.
To run the training, call the following:
python train_model.py data/input_train.npy data/target_train.npy data/model.pt
Note that this may take an hour or longer if you train on a CPU, depending on your hardware.
Part 5- Greedy Parsing Algorithm - Building and Evaluating the Parser (35 pts)
We will now use the trained model to construct a parser. In the file decoder.py, take a look at the class Parser. The class constructor takes the name of a pytorch file, loads the model and stores it in the instance variable model. It also uses the feature extractor from part 2.
TODO: Your task will be to write the method parse_sentence(self, words, pos), which takes as parameters a list of words and POS tags in the input sentence. The method will return an instance of DependencyStructure.
The function first creates a State instance in the initial state, i.e. only word 0 is on the stack, the buffer contains all input words (or rather, their indices) and the deps structure is empty.
The algorithm is the standard transition-based algorithm discussed in class. As long as the buffer is not empty, we use the feature extractor to obtain a representation of the current state. We then call model.predict(features) and retrieve a softmax actived vector of possible actions.
In principle, we would only have to select the highest scoring transition and update the state accordingly. Unfortunately, it is possible that the highest scoring transition is not possible. arc-left or arc-right are not permitted the stack is empty. Shifting the only word out of the buffer is also illegal, unless the stack is empty. Finally, the root node must never be the target of a left-arc.
Instead of selecting the highest-scoring action, select the highest scoring permitted transition. The easiest way to do this is to create a list of possible actions and sort it according to their output probability (make sure the largest probability comes first in the list). Then go through the list until you find a legal transition.
The final step (which is already written for you) is to take the edge in state.deps and create a DependencyStructure object from it.
Running the decoder.py program like this should print CoNLL formatted parse trees for the sentences in the input (note, that the input contains dependency structures already, but these are ignored -- the output is generated by your parser).
python decoder.py data/model.pt data/dev.conll
To evaluate the parser, run the program evaluate.py, which will compare your parser output to the target dependency structures and compute labeled and unlabeled attachment accuracy.
python evaluate.py data/model.pt data/dev.conll
Labeled attachment score is the percentage of correct (parent, relation, child) predictions. Unlabeled attachment score is the percentage of correct (parent, child) predictions.
Even though the output looks reasonable for many sentences, the score for the parser is relatively low (~70 LAS). The current state of the art for dependency

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!