Players MAX and MIN are playing a game with a finite depth of possible moves. MAX calculates

Question:

Players MAX and MIN are playing a game with a finite depth of possible moves. MAX calculates the minimax value of the root to be M. Assume that each player has at least 2 possible actions at every turn and that every distinct sequence of moves leads to a distinct score. Which of the following are true?

a. Assume MIN is playing suboptimally, and MAX does not know this. The outcome of the game can be better than M (i.e. higher for MAX).

b. Assume MAX knows player MIN is playing randomly. There exists a policy for MAX such that MAX can guarantee a better outcome than M.

c. Assume MAX knows MIN is playing suboptimally on every move and knows the policy πMIN that MIN is using (MAX knows exactly how MIN will play). There exists a policy for MAX such that MAX can guarantee a better outcome than M.

d. Assume MAX knows MIN is playing suboptimally at all times but does not know the policy πMIN that MIN is using (MAX knows MIN will choose a suboptimal action at each turn, but does not know which suboptimal action). There exists a policy for MAX such that MAX can guarantee a better outcome than M.

Fantastic news! We've Found the answer you've been seeking!