Question: Consider the tic-tac-toe example of Section 10.7.2. Implement the temporal difference learning algorithm in the language of your choice. If you designed the algorithm to

Consider the tic-tac-toe example of Section 10.7.2. Implement the temporal difference learning algorithm in the language of your choice. If you designed the algorithm to take into account problem symmetries, what do you expect to happen? How might this limit your solution?

Data from section 10.7.2

We next demonstrate a reinforcement learning algorithm for tic-tac-toe, a problem we

have already considered (Chapter 4), and one dealt with in the reinforcement

learning lit- erature by Sutton and Barto (1998). It is important to

compare and contrast the reinforce- ment learning approach with other solution methods,

We next demonstrate a reinforcement learning algorithm for tic-tac-toe, a problem we have already considered (Chapter 4), and one dealt with in the reinforcement learning lit- erature by Sutton and Barto (1998). It is important to compare and contrast the reinforce- ment learning approach with other solution methods, for example, mini-max. As a reminder, tic-tac-toe is a two-person game played on a 3x3 grid, as in Figure II.5. The players, X and O, alternate putting their marks on the grid, with the first player that gets three marks in a row, either horizontal, vertical, or diagonal, the winner. As the reader is aware, when this game is played using perfect information and backed up values, Sec- tion 4.3, it is always a draw. With reinforcement learning we will be able to do something much more interesting, however. We will show how we can capture the performance of an imperfect opponent, and create a policy that allows us to maximize our advantage over this opponent. Our policy can also evolve as our opponent improves her game, and with the use of a model we will be able to generate forks and other attacking moves! First, we must set up a table of numbers, one for each possible state of the game. These numbers, that state's value, will reflect the current estimate of the probability of

Step by Step Solution

3.36 Rating (140 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Example Code Explanation Submission Format import numpy as np import random class TicTacToe def init... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Artificial Intelligence Structures Questions!