Question: Problem 2 . ( 1 5 points ) Consider the following deterministic Markov Decision Process ( MDP ) , describing a simple robot grid world

Problem 2.(15 points) Consider the following deterministic Markov Decision Process (MDP), describing a simple robot grid world with 6 states and 4 actions RIGHT, LEFT, UP, and DOWN (not all actions can be taken at all states). State s5 is a terminal state and once the agent reaches there, stays. The values of the immediate rewards are written next to transitions. Transitions with no value have an immediate reward of 0(the agent only rewarded if it goes from s3 to s5 with RIGHT action with reward of 50, and if it goes from s6 to s5 with UP action with reward of 100). Assume the discount factor =0.8.
a) For each state sin{s1,s2,dots,s6}, compute the value for v**(s).
b) What action needs to be taken at each state based on optimal policy in the figure above and include in your solution? Mark the state-action transition arrows that correspond to one optimal policy. If there is a tie, always choose the state with the smallest index.
c) How many complete iterations of Value Iteration are sufficient to guarantee finding the optimal policy for this MDP? Assume that values are initialized to zero, and that states are considered in an arbitrary order on each iteration.
Problem 2 . ( 1 5 points ) Consider the following

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!