Question: For the 4 4 grid world example we discussed in the lecture: Consider = 1 ( undiscounted MDP ) Non - terminal states: 1 ,

For the 44 grid world example we discussed in the lecture:
Consider =1(undiscounted MDP)
Non-terminal states: 1,2,3,dots,14
Two terminal states (shaded squares)
Actions leading out of the grid leave the state unchanged.
The reward is -1 for all transitions until the terminal state is reached.
The agent follows a policy given as below (NOT the same as we discussed in the lecture):
( North |s)=30%,( South |s)=20%,( West |s)=40%,( East |s)=10%
 For the 44 grid world example we discussed in the lecture:

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!