Question: For a Markov Decision Problem an agent is in the 5 X 5 environment as shown in figure 1 . The transition model of the

For a Markov Decision Problem an agent is in the 5X5 environment as shown in figure 1. The transition model of the environment is shown in figure 2. The intended outcome occurs with probability 0.8, but with probability 0.1 the agent moves at right angles to the intended direction. For each state four actions (up, down, left, right) are available. A collision with a wall results in no movement. S04, S12, S23 and S42 are terminal states with value/ rewards 10,-2,8 and 6 respectively. S11, S13, S14, S24, S31, S32, S33, S34, S43, S44 are blocked states and cannot be reached. The reward for each action is r =-0.5. The discount factor d =0.9(or \gamma =0.9).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!