Question: In this exercise we explore the application of UCT to Tetris. a. Create an implementation the Tetris MDP as described in Figure 17.5. Each action

In this exercise we explore the application of UCT to Tetris. 

a. Create an implementation the Tetris MDP as described in Figure 17.5. Each action simply places the current piece in any reachable location and orientation. 

b. Estimate the reward for a purely random policy by running rollouts. 

c. Implement a version of UCT (Section 5.4) suitable for MDPs. 

d. Apply your algorithm to Tetris and measure its performance as a function of the number of rollouts per move, assuming a purely random policy for rollouts and a value C = √ 2 for the paramater that controls the exploration/exploitation tradeoff. 

e. Come up with a better rollout policy and measure its performance as a function of the number of rollouts and CPU time.

Figure 17.5

S B A R=16 R = 64 E

S B A R=16 R = 64 E

Step by Step Solution

3.30 Rating (168 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a To create an implementation of Tetris MDP we need to define the state space action space transition probabilities and rewards The state space can be ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Artificial Intelligence A Modern approach Questions!