Question: 8. (9 points) Dynamic Programming: Answer the questions based on the MDP below 23 B, r=0 1/3 1/3 stayi ) stay A r=0 States: (A,

 8. (9 points) Dynamic Programming: Answer the questions based on the

8. (9 points) Dynamic Programming: Answer the questions based on the MDP below 23 B, r=0 1/3 1/3 stayi ) stay A r=0 States: (A, B, C) Actions and Transition Probabilities: stay: stays in the current state with probability 1 move: moves to the next state with 2/3 probability, stays in the current state with 1/3 probability Rewards: R(A) = 0, R(B) = 0, R(C) = 1 Discount Factory = 0.6 2/3 stay 2/3 C, r=1 move 1/3 (a) (6 points) Perform one step of value iteration and fill in the table below. Make sure to show your work below the table. Iteration V(A) V(B) V(C) 0 0.4 1.6 1 0 (b) (3 points) What is the policy extracted from the calculated Q-values

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!