Question: Q 9 Policy Iteration: Cycle 1 4 Points Consider the following transition diagram, transition function and reward function for an MDP . Discount Factor,

Q9 Policy Iteration: Cycle
14 Points
Consider the following transition diagram, transition function and reward function for an MDP.
Discount Factor, \(\gamma=0.5\);
Suppose we are doing policy evaluation, by following the policy given by the left-hand side table below. Our current estimates (at the end of some iteration of policy evaluation) of the value of states when following the current policy is given in the righthand side table.
We recommend you work out the solutions to the following questions on a sheet of scratch paper, and then enter your results into the answer boxes.
Part 1
What is \( V_{k+1}^{\pi}(A)\)?
Suppose that policy evaluation converges to the following value function, \( V_{\infty}^{\pi}\).
Now let's execute policy improvement.
Part 2
What is \( Q_{\infty}^{\pi}\)(A, clockwise)?
Part 3: What is \( Q_{\infty}^{\pi}\)(A, counterclockwise)?
Part 4: What is the updated action for state A?
Clockwise
Counterclockwise
Q 9 Policy Iteration: Cycle 1 4 Points Consider

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!