# Question: In this exercise we will consider two player MDPs that correspon

In this exercise we will consider two-player MDPs that correspond to zero-sum, turn- taking games like those in Chapter 6. Let the players he A and B, and let R (s) be the reward for player A in s. (The reward for B is always equal and opposite.)

a. Let UA (s) be the utility of state s when it is A’s turn to move in s, and let UB (S) he the utility of state s when it is B’s turn to move in s. All rewards and utilities are calculated from A’s point of view (just as in a mini-max game tree). Write down Bellman equations defining UA (S) and UB (S).

b. Explain how to do two-player value iteration with these equations, and define a suitable stopping criterion

c. Consider the game described. Draw the state space (rather than the game tree), showing the moves by A as solid lines and moves by B as dashed lines. Mark each state with R (s). You will find it helpful to arrange the states (s A, s B) on a two-dimensional grid, using s and as “coordinates.”

d. Now apply Iwo-player value iteration to solve this game, and derive the optimal policy.

a. Let UA (s) be the utility of state s when it is A’s turn to move in s, and let UB (S) he the utility of state s when it is B’s turn to move in s. All rewards and utilities are calculated from A’s point of view (just as in a mini-max game tree). Write down Bellman equations defining UA (S) and UB (S).

b. Explain how to do two-player value iteration with these equations, and define a suitable stopping criterion

c. Consider the game described. Draw the state space (rather than the game tree), showing the moves by A as solid lines and moves by B as dashed lines. Mark each state with R (s). You will find it helpful to arrange the states (s A, s B) on a two-dimensional grid, using s and as “coordinates.”

d. Now apply Iwo-player value iteration to solve this game, and derive the optimal policy.

**View Solution:**## Answer to relevant Questions

Show that dominant strategy equilibrium is Nash equilibrium, hut not vice versa.Consider the problem faced by an infant learning to speak and understand a language. Explain how this process fits into the general learning model, identifying each of the components of the model as appropriate.In the recursive construction of decision trees, it sometimes happens that a mixed set of positive and negative examples remains at a leaf node, even after all the attributes have been used. Suppose that we have p positive ...Fill in the missing values for the clauses C1 or C2 (or both) in the following sets of clauses, given that C is the re solvent of C1 and C2: a. C = True → P (A, B), C1 = P (x, y) → Q(x, y), C2 =?? b. C = True ...Consider the noisy-OR model for fever described in Section 14.3. Explain how to apply maximum-likelihood learning to fit the parameters of such a model to a set of complete data.Post your question