In this exercise we explore the application of UCT to Tetris. a. Create an implementation the Tetris

Question:

In this exercise we explore the application of UCT to Tetris.

a. Create an implementation the Tetris MDP as described in Figure 17.5. Each action simply places the current piece in any reachable location and orientation.

b. Estimate the reward for a purely random policy by running rollouts.

c. Implement a version of UCT (Section 5.4) suitable for MDPs.

d. Apply your algorithm to Tetris and measure its performance as a function of the number of rollouts per move, assuming a purely random policy for rollouts and a value C = √ 2 for the paramater that controls the exploration/exploitation tradeoff.

e. Come up with a better rollout policy and measure its performance as a function of the number of rollouts and CPU time.

Figure 17.5