Question: Markov Decision Process: You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states,
Markov Decision Process: 
You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states, your agent can perform 4 actions: Up, Down, Left, Right; with a living reward of -1. In squares D & E your agent can ONLY take the Exit action with an immediate reward of -10 & +10 respectively. Your agent's actions are successful 90% of the time; 10% of the time the agent moves in one of the two orthogonal (perpendicular) directions, with equal probability (5%). If the movement is blocked by a wall (the blue squares and the outer edges), the agent stays in place. For example, if your agent is in square B, and picks the Right action, 90% of the time, it ends up in C, and 10% of the time it ends up in the same square, B. Assume a discount factor of 0.9. ( = 0.9). Input Policy n A B| C|D E Given a policy, a, as specified by the arrows, evaluate it for two it- erations using the Policy Evaluation algorithm - fill in the values in the table below - note that the first iteration has already been done for you: C D E A VT -1 -10 +10 -1 V V3" You are given the Gridworld shown in the figure below. Assume a known Markov Decision Process (MDP) as follows: In all states, your agent can perform 4 actions: Up, Down, Left, Right; with a living reward of -1. In squares D & E your agent can ONLY take the Exit action with an immediate reward of -10 & +10 respectively. Your agent's actions are successful 90% of the time; 10% of the time the agent moves in one of the two orthogonal (perpendicular) directions, with equal probability (5%). If the movement is blocked by a wall (the blue squares and the outer edges), the agent stays in place. For example, if your agent is in square B, and picks the Right action, 90% of the time, it ends up in C, and 10% of the time it ends up in the same square, B. Assume a discount factor of 0.9. ( = 0.9). Input Policy n A B| C|D E Given a policy, a, as specified by the arrows, evaluate it for two it- erations using the Policy Evaluation algorithm - fill in the values in the table below - note that the first iteration has already been done for you: C D E A VT -1 -10 +10 -1 V V3
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
