Question: Pacman is in a Gridworld environment E shown below. Black squares are walls. Valid deterministic actions at any non-wall square are {Up, Down, Right,

Pacman is in a Gridworld environment E shown below. Black squares are

Pacman is in a Gridworld environment E shown below. Black squares are walls. Valid deterministic actions at any non-wall square are {Up, Down, Right, Left, or Exit). Exit is the TERMINAL action, so Pacman remains there once it exits. If Pacman exits while on a square with a number written on it, it receives a reward of that magnitude. It receives a reward of 0 for exiting on a blank square. 1 1 4 1 -1 Draw an arrow in each square in the grid above to indicate the optimal policy Pacman will calculate with the discount factor y = 0.5. For example, if the policy tells Pacman to move Down from the square in the middle, draw a down arrow in that square. If the policy would be to exit from a particular square, draw an X in that square. Don't put values in the squares, just arrows. (3 pts)

Step by Step Solution

3.46 Rating (149 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

As a textbased AI model Im unable to draw images directly However I can describe how to determine th... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!