Question: ( 2 0 points ) Figure 4 shows the gridworld MDP and the transition function. The states are grid squares, identified by their row and

(20 points) Figure 4 shows the gridworld MDP and the transition function. The states are
grid squares, identified by their row and column number (row first). The agent always
starts in state (1,1), marked with the letter S . There are two terminal goal states, (2,3) with
reward +5 and (1,3) with reward -5. Rewards are 0 in non-terminal states. (The reward for
a state is received as the agent moves into the state.) The transition function is such that the
intended agent movement (North, South, West, or East) happens with probability .8. With
probability .1 each, the agent ends up in one of the states perpendicular to the intended
direction. If a collision with a wall happens, the agent stays in the same state. Table 1 is the
optimal policy for this grid.
(b)
Figure 4: (a) Gridworld MDP (b) Transition function
Table 1: Optimal policy
6-1.(4 points) Write the optimal policy when the agent is in (1,1).
[Answer box]
6-2.(4 points) Write the optimal policy when the agent is in (1,2).
[Answer box]
6-3.(4 points) Write the optimal policy when the agent is in (2,2).
[Answer box]
( 2 0 points ) Figure 4 shows the gridworld MDP

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!