Question: Question 1(50 points) Consider Pacman that uses MDPs to maximize his expected utility. In each environment: Pacman has the standard actions {North, East, South, West}

Question 1(50 points)

Consider Pacman that uses MDPs to maximize his expected utility. In each environment: Pacman has the standard actions {North, East, South, West} unless blocked by an outer wall There is a reward of 1 point when eating the dot (for example, in the grid below, (, , ) = 1) The game ends when the dot (blue circle) is eaten.

Question 1(50 points) Consider Pacman that uses MDPs to maximize his expected

a) Consider the following grid where there is a single food pellet in the bottom right corner (B). The discount factor is 0.2. There is no living reward. The states are simply the grid location.

a) What is the optimal policy for each state?

State

()

A

C

D

E

F

b) What is the optimal value for the state of being in the upper left corner (E)? Reminder: the discount factor is 0.2.

c) Using value iteration with the value of all states equal to zero at = 0, for which iteration k will () = (), explain.

\begin{tabular}{|r|r|r|} \hlineE & C & A \\ \hline F & D & B \\ \hline \end{tabular}

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!