Question: Part 4 - Training loop and running training ( 0 pts - - already built, but an important step ) Next, consider the function train

Part

4 -

Training loop and running training

(0

pts

- -

already built, but an important step

)

Next, consider the function train

(

model

,

loader

),

as well as the main section in train

_

model.py

.

The train function will train the model for a single epoch. It has already been completed for you.

The function first defines the loss function, which isLinks to an external site. CrossEntropyLossLinks to an external site.. Then, it specifies the optimizer algorithm used for training. In class, we only discussed stochastic gradient descent. Here, we are using AdagradLinks to an external site. with a learning rate of

0.01 .

The main loop iterates over the items obtained by the DataLoader. For each batch, it obtains the inputs

(

(

batch

_

size,

6)

tensor

)

and the targets

(

(

batch

_

size,

91)

tensor

) .

It then calls the model to obtain the predictions for the inputs and computes the loss. Finally it performs the backward pass as follows:

optimizer.zero

_

grad

()

# set the gradient for all parameters to

0

loss.backward

()

# recompute gradients based on current loss

optimizer.step

()

# update the parameters based on the error gradients

The method also computes various statistics: it reports the training loss after each

1000

batches, as well as the accuracy after the epoch.

The main section at the bottom of train

_

model.py then runs the actual training. It first instantiates the model

(

passing the number of word types and number of output labels

) .

Then, it loads the training data set, and wraps it in a DataLoader with a batch size of

16 .

Then, it runs five epochs of training. In my experiments, after

5

epochs I reached a training loss of

< 0.31

and a training accuracy of about

0.90 .

Finally, the trained model parameters are saved to disk.

To run the training, call the following:

python train

_

model.py data

/

input

_

train.npy data

/

target

_

train.npy data

/

model

.

Note that this may take an hour or longer if you train on a CPU, depending on your hardware.

Part

5 -

Greedy Parsing Algorithm

-

Building and Evaluating the Parser

(35

pts

)

We will now use the trained model to construct a parser. In the file decoder.py

,

take a look at the class Parser. The class constructor takes the name of a pytorch file, loads the model and stores it in the instance variable model. It also uses the feature extractor from part

2 .

TODO: Your task will be to write the method parse

_

sentence

(

self

,

words, pos

),

which takes as parameters a list of words and POS tags in the input sentence. The method will return an instance of DependencyStructure.

The function first creates a State instance in the initial state, i

.

.

only word

0

is on the stack, the buffer contains all input words

(

or rather, their indices

)

and the deps structure is empty.

The algorithm is the standard transition

-

based algorithm discussed in class. As long as the buffer is not empty, we use the feature extractor to obtain a representation of the current state. We then call model.predict

(

features

)

and retrieve a softmax actived vector of possible actions.

In principle, we would only have to select the highest scoring transition and update the state accordingly. Unfortunately, it is possible that the highest scoring transition is not possible. arc

-

left or arc

-

right are not permitted the stack is empty. Shifting the only word out of the buffer is also illegal, unless the stack is empty. Finally, the root node must never be the target of a left

-

arc.

Instead of selecting the highest

-

scoring action, select the highest scoring permitted transition. The easiest way to do this is to create a list of possible actions and sort it according to their output probability

(

make sure the largest probability comes first in the list

) .

Then go through the list until you find a legal transition.

The final step

(

which is already written for you

)

is to take the edge in state.deps and create a DependencyStructure object from it

.

Running the decoder.py program like this should print CoNLL formatted parse trees for the sentences in the input

(

note

,

that the input contains dependency structures already, but these are ignored

- -

the output is generated by your parser

) .

python decoder.py data

/

model

.

pt data

/

dev

.

conll

To evaluate the parser, run the program evaluate.py

,

which will compare your parser output to the target dependency structures and compute labeled and unlabeled attachment accuracy.

python evaluate.py data

/

model

.

pt data

/

dev

.

conll

Labeled attachment score is the percentage of correct

(

parent

,

relation, child

)

predictions. Unlabeled attachment score is the percentage of correct

(

parent

,

child

)

predictions.

Even though the output looks reasonable for many sentences, the score for the parser is relatively low

(

70

LAS

) .

The current state of the art for dependency

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Jupyter Notebook Now that we have tried our hand at some single-layer nets, let's see how they stack up compared to multi-layer nets. :) We will be exploring the basic concepts of learning non-linear...

ML in a nutshell Optimization, and machine learning, are intimately connected. At a very coarse level, ML works as follows. First, you come up somehow with a very complicated model y = M(x, 0), which...

[Title Here, up to 12 Words, on One to Two Lines] Kerry L. Williams Embry Riddle Aeronautical University Abstract This proposal researches the importance of the management of a department be a part...

I need the multiple choice questions answers from chapter 4 and 5 of the attached & Monograph...

This a group assignment. i need to answer a simple question about WACC. Plz help me. THE PATHWAY FROM CONSTRUCTION TO PRODUCTION 2003 The dream begins 2004 Cloudbreak identified 2005 S&P/ASX 200...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

SUMMARY OF LEARNING OBJECTIVES AND KEY POINTS 1. Identify the basic elements of organizations. Organizations are made up of a series of elements: Designing jobs Grouping jobs Establishing reporting...

TANGLEWOOD CASEBOOK for use with STAFFING ORGANIZATIONS 5th Ed. Kammeyer-Mueller 1 TANGLEWOOD CASEBOOK To accompany Staffing Organizations, fifth edition, 2006. Prepared by John Kammeyer-Mueller...

TANGLEWOOD CASEBOOK for use with STAFFING ORGANIZATIONS 7th Ed. Kammeyer-Mueller 1 TANGLEWOOD CASEBOOK To accompany Staffing Organizations, seventh edition, 2012. Prepared by John Kammeyer-Mueller...

TANGLEWOOD CASEBOOK for use with STAFFING ORGANIZATIONS 5th Ed. Kammeyer-Mueller 1 TANGLEWOOD CASEBOOK To accompany Staffing Organizations, fifth edition, 2006. Prepared by John Kammeyer-Mueller...

A 1.50-kg iron horseshoe initially at 600C is dropped into a bucket containing 20.0 kg of water at 25.0C. What is the final temperature? (Ignore the heat capacity of the container, and assume that a...

Suppose that f : R R. If f" exists and is bounded on R, and there is an 0 > 0 such that |f'(x)| > 0 for all x R, prove that there exists a > 0 such that if |f(x0)| 0 for some x0 R, then f...

You have calculated the following ratios for Elsinore Corp and its industry. \ table [ [ , Elsinore, \ table [ [ Industry ] ] ] , [ , , \ table [ [ Average ] ] ] , [ Equity Multiplier, 2 . 5 , 3 . 5...

Two players participate in a first-price sealedbid auction in an IPV setting