In this exercise we explore the application of UCT to Tetris. a. Create an implementation the Tetris
Question:
In this exercise we explore the application of UCT to Tetris.
a. Create an implementation the Tetris MDP as described in Figure 17.5. Each action simply places the current piece in any reachable location and orientation.
b. Estimate the reward for a purely random policy by running rollouts.
c. Implement a version of UCT (Section 5.4) suitable for MDPs.
d. Apply your algorithm to Tetris and measure its performance as a function of the number of rollouts per move, assuming a purely random policy for rollouts and a value C = √ 2 for the paramater that controls the exploration/exploitation tradeoff.
e. Come up with a better rollout policy and measure its performance as a function of the number of rollouts and CPU time.
Figure 17.5
Step by Step Answer:
Artificial Intelligence A Modern Approach
ISBN: 9780134610993
4th Edition
Authors: Stuart Russell, Peter Norvig