Question: Consider the gridworld shown below. The left panel shows the name of each state A through E. The middle panel shows the current estimate of

Consider the gridworld shown below. The left panel shows the name of each state A through E. The middle panel shows the current estimate of the value function V" for each state. A transition is observed, that takes the agent from state B through taking action east into state C, and the agent receives a reward of -2. Assuming Y =1, (1 = 1f2, what are the value estimates after the TD learning update? (note: the value will change for one of the states only) States Observed Transition: Assayew: 1,01: 112 V(s) ( (1 a)V-(3) + O: [R(3,1T(S); 3!) + 7V(3!)]
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
