Question: In this exercise we explore the application of UCT to Tetris. a. Create an implementation the Tetris MDP as described in Figure 17.5. Each action
In this exercise we explore the application of UCT to Tetris.
a. Create an implementation the Tetris MDP as described in Figure 17.5. Each action simply places the current piece in any reachable location and orientation.
b. Estimate the reward for a purely random policy by running rollouts.
c. Implement a version of UCT (Section 5.4) suitable for MDPs.
d. Apply your algorithm to Tetris and measure its performance as a function of the number of rollouts per move, assuming a purely random policy for rollouts and a value C = √ 2 for the paramater that controls the exploration/exploitation tradeoff.
e. Come up with a better rollout policy and measure its performance as a function of the number of rollouts and CPU time.
Figure 17.5

S B A R=16 R = 64 E
Step by Step Solution
3.30 Rating (168 Votes )
There are 3 Steps involved in it
a To create an implementation of Tetris MDP we need to define the state space action space transition probabilities and rewards The state space can be ... View full answer
Get step-by-step solutions from verified subject matter experts
