Question: In this exercise we will consider two-player MDPs that correspond to zero-sum, turn- taking games like those in Chapter 6. Let the players he A

In this exercise we will consider two-player MDPs that correspond to zero-sum, turn- taking games like those in Chapter 6. Let the players he A and B, and let R (s) be the reward for player A in s. (The reward for B is always equal and opposite.)

a. Let UA (s) be the utility of state s when it is A’s turn to move in s, and let UB (S) he the utility of state s when it is B’s turn to move in s. All rewards and utilities are calculated from A’s point of view (just as in a mini-max game tree). Write down Bellman equations defining UA (S) and UB (S).

b. Explain how to do two-player value iteration with these equations, and define a suitable stopping criterion

c. Consider the game described. Draw the state space (rather than the game tree), showing the moves by A as solid lines and moves by B as dashed lines. Mark each state with R (s). You will find it helpful to arrange the states (s A, s B) on a two-dimensional grid, using s and as “coordinates.”

d. Now apply Iwo-player value iteration to solve this game, and derive the optimal policy.

Step by Step Solution

3.44 Rating (157 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a For U we have UAs Rs maxPsa sUBs a and for UB we have UB 8 Rs min Psa sUAs 14 24 34 1... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Document Format (1 attachment)

Word file Icon

21-C-S-A-I (253).docx

120 KBs Word File

Students Have Also Explored These Related Artificial Intelligence Questions!