Consider the tic-tac-toe example of Section 10.7.2. Implement the temporal difference learning algorithm in the language of
Question:
Consider the tic-tac-toe example of Section 10.7.2. Implement the temporal difference learning algorithm in the language of your choice. If you designed the algorithm to take into account problem symmetries, what do you expect to happen? How might this limit your solution?
Data from section 10.7.2
Transcribed Image Text:
We next demonstrate a reinforcement learning algorithm for tic-tac-toe, a problem we have already considered (Chapter 4), and one dealt with in the reinforcement learning lit- erature by Sutton and Barto (1998). It is important to compare and contrast the reinforce- ment learning approach with other solution methods, for example, mini-max. As a reminder, tic-tac-toe is a two-person game played on a 3x3 grid, as in Figure II.5. The players, X and O, alternate putting their marks on the grid, with the first player that gets three marks in a row, either horizontal, vertical, or diagonal, the winner. As the reader is aware, when this game is played using perfect information and backed up values, Sec- tion 4.3, it is always a draw. With reinforcement learning we will be able to do something much more interesting, however. We will show how we can capture the performance of an imperfect opponent, and create a policy that allows us to maximize our advantage over this opponent. Our policy can also evolve as our opponent improves her game, and with the use of a model we will be able to generate forks and other attacking moves! First, we must set up a table of numbers, one for each possible state of the game. These numbers, that state's value, will reflect the current estimate of the probability of
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Answer rating: 66% (3 reviews)
Answered By
OTIENO OBADO
I have a vast experience in teaching, mentoring and tutoring. I handle student concerns diligently and my academic background is undeniably aesthetic
4.30+
3+ Reviews
10+ Question Solved
Related Book For
Artificial Intelligence Structures And Strategies For Complex Problem Solving
ISBN: 9780321545893
6th Edition
Authors: George Luger
Question Posted:
Students also viewed these Computer science questions
-
What happens if the temporal difference algorithm of Problem 13 plays tic-tac-toe against itself? Data from problem 13 Consider the tic-tac-toe example of Section 10.7.2. Implement the temporal...
-
This role play will be about an interview with the client mentioned in the case scenario. Word limit will be around 500 words. Please make sure to cover the following points. In your role playas, you...
-
Case Study: Quick Fix Dental Practice Technology requirements Application must be built using Visual Studio 2019 or Visual Studio 2017, professional or enterprise. The community edition is not...
-
On April 1 of the current taxable year, Mr. Lasing Gho died leaving Php 25, 000, 000 of net distributable estate. He also left behind Tessie, his legitimate wife; Rhealyn, his legally adopted...
-
Find the tension in the two wires supporting the traffic light shown in Fig 9-46. 37 53 33 kg
-
Match each definition on the left with its mathematical expression on the right. Equation of a fourth-degree polynomial model (a) Y 10 , Y 11 , Y 12 , Y 13 , Y 14 (b) b 0 + b 1 t (c) b 0 + 1 Y t-1 +...
-
The following MINITAB output presents a 95% confidence interval for the mean ozone level on days when the relative humidity is 60%, and a 95% prediction interval for the ozone level on a particular...
-
Multiple Choice Questions The following questions deal with audit risk and evidence. Choose the best response. a. As the acceptable level of detection risk decreases, an auditor may (1) Reduce...
-
Figure 3 shows the vertical cross section through a mylar balloon. That section shows the generating curve for a rotational surface. The upper portion of the balloon is a perfect sphere, while the...
-
Write a program that implements the fuzzy controller of Section 9.2.2. Data from section 9.2.2 There are two assumptions that are essential for the use of formal set theory. The first is with respect...
-
Analyze Samuels checker playing program from a reinforcement learning perspective. Sutton and Barto (1998, Section 11.2) offer suggestions in this analysis.
-
The figure shows a bolted lap joint that uses SAE grade 8 bolts. The members are made of cold-drawn AISI 1040 steel. Find the safe tensile shear load F that can be applied to this connection if the...
-
Fred Jones Paycheque this year, please choose the correct Payroll Payment journal entry only , for this paycheque to the nearest dollar. Salary per pay $1,000 Deductions: CPP (48) EI (16) Income Tax...
-
What are the differences between the Enterprise Edition, Standard Edition, and Standard Edition One installation types of Oracle Database. Which one would be suitable for a small business or a...
-
For the topic of Women of Color Soccer Players, two sources that seem interesting to me are: "The Power of Women of Color in Soccer" by Rachel Bonadies, published in The Harvard Crimson in December...
-
Divide the following and check by multiplication: 43)2,661 Quotient Remainder
-
what are the benefits of the relational model relationships and ERD ( entity-relationship model)?
-
Porter Company is evaluating the following assets to determine whether it can use fair value as deemed cost in first-time adoption of IFRS. 1. Biological assets related to agricultural activity for...
-
The cost curve for the city water supply is C(Q) = 16 + 1/4 Q2, where Q is the amount of water supplied and C(Q) is the cost of providing Q acre-feet of water. (An acre-foot is the amount of water...
-
What does the amplitude of a signal measure? What does the frequency of a signal measure? What does the phase of a signal measure?
-
What is the relationship between period and frequency?
-
If there is a single path between the source host and the destination host, do we need a router between the two hosts?
-
1. A material with a specific weight of 105 lbf/ft on earth is pressed into a 0.5 ft diameter cylinder and then cut into segments. If each cylindrical segment must be 255 lbm, how long should each...
-
17. Air enters the turbine section of a jet engine at a temperature of 1,500 C and leaves at 500 C. Compute the change in specific enthalpy (kJ/kg) through the turbine in four different ways: a....
-
2. A 10-kg ring of negligible thickness and mean diameter of 4 m is resting on a smooth inclined surface and is held in position by an angled bar AB of negligible mass as shown in Fig. 2. The bar is...
Pharmaceutical Stress Testing Predicting Drug Degradation 1st Edition - ISBN: 0824740211 - Free Book
Study smarter with the SolutionInn App