Re-implement one of the first machine learning game playing system, a version of MENACE, the Machine Educable

Question:

Re-implement one of the first machine learning game playing system, a version of MENACE, the Machine Educable Noughts and Crosses (Tic-Tac-Toe) Engine (Michie, 1963). The original version was implemented with 304 matchboxes filled with colored beads. You might find it easier to use a computer. 

a. Before it has learned anything, MENACE will play randomly: each possible move in each position has n chances of being selected. On the very first move there are 9 possible moves, so each has a n/(9n) chance of being selected. MENACE makes its moves against a player (it could be an optimal minimax opponent) who makes their move until the game ends. 

b. If MENACE has won, each of the moves it makes is rewarded by having 3 more chances added to its odds (i.e. an increase from n to n+ 3). If the game is a draw, 1 more chance is added, and if a loss, 1 chance is subtracted. 

c. Record MENACE’s history of wins and losses over many games. How long does it take to reach equilibrium where it avoids losing? 

d. Experiment with different choices of values of n, and of the 3/1/-1 rewards. Which choices lead to faster winning performance? 

e. When your program has reached equilibrium, compare its policy to the optimal policy from a minimax algorithm. Report on the results.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: