Question: In this exercise we will consider two-player MDPs that correspond to zero-sum, turn- taking games like those in Chapter 6. Let the players he A
In this exercise we will consider two-player MDPs that correspond to zero-sum, turn- taking games like those in Chapter 6. Let the players he A and B, and let R (s) be the reward for player A in s. (The reward for B is always equal and opposite.)
a. Let UA (s) be the utility of state s when it is A’s turn to move in s, and let UB (S) he the utility of state s when it is B’s turn to move in s. All rewards and utilities are calculated from A’s point of view (just as in a mini-max game tree). Write down Bellman equations defining UA (S) and UB (S).
b. Explain how to do two-player value iteration with these equations, and define a suitable stopping criterion
c. Consider the game described. Draw the state space (rather than the game tree), showing the moves by A as solid lines and moves by B as dashed lines. Mark each state with R (s). You will find it helpful to arrange the states (s A, s B) on a two-dimensional grid, using s and as “coordinates.”
d. Now apply Iwo-player value iteration to solve this game, and derive the optimal policy.
Step by Step Solution
3.44 Rating (157 Votes )
There are 3 Steps involved in it
a For U we have UAs Rs maxPsa sUBs a and for UB we have UB 8 Rs min Psa sUAs 14 24 34 1... View full answer
Get step-by-step solutions from verified subject matter experts
Document Format (1 attachment)
21-C-S-A-I (253).docx
120 KBs Word File
