Question: Semi - Gradient Update This problem presents a brief glimpse of the problems that can arise in off - policy learning with function approximation, through

Semi-Gradient Update
This problem presents a brief glimpse of the problems that can arise in off-policy learning with function approximation, through the concepts that have been introduced so far. If
you would like a more detailed discussion on these issues, you may refer to Chapter 11. Let us now apply semi-gradient TD learning from Chapter 9 with batch updates (Section
6.3) to the following value-function approximation problem, based on a problem known as Baird's Counterexample:
In the above diagram, each circle is a state. The arrows represent some possible transitions between states.
The formulas shown in each state are for their values in terms of some parameters wi,i=0,1,dots,6 that comprise the value function approximator that we wish to learn.
Consider that we see each of the transitions shown above exactly once in a batch of data.
The reward for each transition is 0, and the discount factor =0.95.
We update the weights in the function approximation using semi-gradient TD(0).
Before the update, the weights are: (1,1,1,1,1,1,5), with w6=5.
With a learning rate =0.1, what will be the weights after the update?
Consider the updates happening in a batch, with the individual updates summed up and applied to the weights after computing the update for all the six transitions. Enter your
answers to 3 decimal places.
5
0 points earned
w0
1.185
0 points earned
w1
1
7
0 points earned
8
0 points earned
9
O points earned
w4
10
0 points earned
 Semi-Gradient Update This problem presents a brief glimpse of the problems

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!