Question: Value Iteration ( 2 5 points ) Consider the gridworld MDP shown to the right. The terminal state ( 3 , 2 ) has a

Value Iteration (25 points) Consider the gridworld MDP shown to the right. The terminal state (3,2) has a reward of +20 and the non-terminal state to the left of it has a reward of -10. Rewards are -1 for all other states. The agent makes its intended move (up, down, left, or right) with a probability 0.8, and moves in a perpendicular direction with probability 0.1 for each side (e.g., if intending to go right, the agent can move up or down with a probability of 0.1 each). If the agent runs into a wall, it stays in the same place. Calculate the utilities of the following states for the next two iterations of the value iteration algorithm using a discount factor of \gamma =0.8. Write your answer in the table below, where columns are states and rows are iterations. Note, the initial iteration is provided and the next iteration is partially provided. Show your work. YOUR WORK BELOW

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!