Question: Part 4 - Training loop and running training ( 0 pts - - already built, but an important step ) Next, consider the function train
Part Training loop and running training pts already built, but an important step
Next, consider the function trainmodel loader as well as the main section in trainmodel.py The train function will train the model for a single epoch. It has already been completed for you.
The function first defines the loss function, which isLinks to an external site. CrossEntropyLossLinks to an external site.. Then, it specifies the optimizer algorithm used for training. In class, we only discussed stochastic gradient descent. Here, we are using AdagradLinks to an external site. with a learning rate of
The main loop iterates over the items obtained by the DataLoader. For each batch, it obtains the inputs a batchsize, tensor and the targets a batchsize, tensor It then calls the model to obtain the predictions for the inputs and computes the loss. Finally it performs the backward pass as follows:
optimizer.zerograd # set the gradient for all parameters to
loss.backward # recompute gradients based on current loss
optimizer.step # update the parameters based on the error gradients
The method also computes various statistics: it reports the training loss after each batches, as well as the accuracy after the epoch.
The main section at the bottom of trainmodel.py then runs the actual training. It first instantiates the model passing the number of word types and number of output labels Then, it loads the training data set, and wraps it in a DataLoader with a batch size of
Then, it runs five epochs of training. In my experiments, after epochs I reached a training loss of and a training accuracy of about
Finally, the trained model parameters are saved to disk.
To run the training, call the following:
python trainmodel.py datainputtrain.npy datatargettrain.npy datamodelpt
Note that this may take an hour or longer if you train on a CPU, depending on your hardware.
Part Greedy Parsing Algorithm Building and Evaluating the Parser pts
We will now use the trained model to construct a parser. In the file decoder.py take a look at the class Parser. The class constructor takes the name of a pytorch file, loads the model and stores it in the instance variable model. It also uses the feature extractor from part
TODO: Your task will be to write the method parsesentenceself words, pos which takes as parameters a list of words and POS tags in the input sentence. The method will return an instance of DependencyStructure.
The function first creates a State instance in the initial state, ie only word is on the stack, the buffer contains all input words or rather, their indices and the deps structure is empty.
The algorithm is the standard transitionbased algorithm discussed in class. As long as the buffer is not empty, we use the feature extractor to obtain a representation of the current state. We then call model.predictfeatures and retrieve a softmax actived vector of possible actions.
In principle, we would only have to select the highest scoring transition and update the state accordingly. Unfortunately, it is possible that the highest scoring transition is not possible. arcleft or arcright are not permitted the stack is empty. Shifting the only word out of the buffer is also illegal, unless the stack is empty. Finally, the root node must never be the target of a leftarc.
Instead of selecting the highestscoring action, select the highest scoring permitted transition. The easiest way to do this is to create a list of possible actions and sort it according to their output probability make sure the largest probability comes first in the list Then go through the list until you find a legal transition.
The final step which is already written for you is to take the edge in state.deps and create a DependencyStructure object from it
Running the decoder.py program like this should print CoNLL formatted parse trees for the sentences in the input note that the input contains dependency structures already, but these are ignored the output is generated by your parser
python decoder.py datamodelpt datadevconll
To evaluate the parser, run the program evaluate.py which will compare your parser output to the target dependency structures and compute labeled and unlabeled attachment accuracy.
python evaluate.py datamodelpt datadevconll
Labeled attachment score is the percentage of correct parent relation, child predictions. Unlabeled attachment score is the percentage of correct parent child predictions.
Even though the output looks reasonable for many sentences, the score for the parser is relatively low ~ LAS The current state of the art for dependency
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
