Question: 8. (9 points) Dynamic Programming: Answer the questions based on the MDP below 2/3 B, r=0 1/3 1/3 stay BOW stay A, r=0 States: {A,

8. (9 points) Dynamic Programming: Answer the questions based on the MDP below 2/3 B, r=0 1/3 1/3 stay BOW stay A, r=0 States: {A, B, C) Actions and Transition Probabilities: stay: stays in the current state with probability 1 . move: moves to the next state with 2/3 probability, stays in the current state with 1/3 probability Rewards: R(A) = 0, R(B) = 0, R/C) = 1 Discount Factor: y = 0.6 2/3 1. I stay 2/3 C, r=1 move 1/3 (a) (6 points) Perform one step of value iteration and fill in the table below. Make sure to s your work below the table. Iteration V(A) V(B) V(C) 0 0 0.4 1.6 1 (b) (3 points) What is the policy extracted from the calculated Q-values
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
