In this exercise we will consider two-player MDPs that correspond to zero-sum, turn- taking games like those

Question:

In this exercise we will consider two-player MDPs that correspond to zero-sum, turn- taking games like those in Chapter 6. Let the players he A and B, and let R (s) be the reward for player A in s. (The reward for B is always equal and opposite.)

a. Let UA (s) be the utility of state s when it is A’s turn to move in s, and let UB (S) he the utility of state s when it is B’s turn to move in s. All rewards and utilities are calculated from A’s point of view (just as in a mini-max game tree). Write down Bellman equations defining UA (S) and UB (S).

b. Explain how to do two-player value iteration with these equations, and define a suitable stopping criterion

c. Consider the game described. Draw the state space (rather than the game tree), showing the moves by A as solid lines and moves by B as dashed lines. Mark each state with R (s). You will find it helpful to arrange the states (s A, s B) on a two-dimensional grid, using s and as “coordinates.”

d. Now apply Iwo-player value iteration to solve this game, and derive the optimal policy.