Consider the tic-tac-toe example of Section 10.7.2. Implement the temporal difference learning algorithm in the language of
Question:
Consider the tic-tac-toe example of Section 10.7.2. Implement the temporal difference learning algorithm in the language of your choice. If you designed the algorithm to take into account problem symmetries, what do you expect to happen? How might this limit your solution?
Data from section 10.7.2
Transcribed Image Text:
We next demonstrate a reinforcement learning algorithm for tic-tac-toe, a problem we have already considered (Chapter 4), and one dealt with in the reinforcement learning lit- erature by Sutton and Barto (1998). It is important to compare and contrast the reinforce- ment learning approach with other solution methods, for example, mini-max. As a reminder, tic-tac-toe is a two-person game played on a 3x3 grid, as in Figure II.5. The players, X and O, alternate putting their marks on the grid, with the first player that gets three marks in a row, either horizontal, vertical, or diagonal, the winner. As the reader is aware, when this game is played using perfect information and backed up values, Sec- tion 4.3, it is always a draw. With reinforcement learning we will be able to do something much more interesting, however. We will show how we can capture the performance of an imperfect opponent, and create a policy that allows us to maximize our advantage over this opponent. Our policy can also evolve as our opponent improves her game, and with the use of a model we will be able to generate forks and other attacking moves! First, we must set up a table of numbers, one for each possible state of the game. These numbers, that state's value, will reflect the current estimate of the probability of
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Answer rating: 50% (2 reviews)
Answered By
OTIENO OBADO
I have a vast experience in teaching, mentoring and tutoring. I handle student concerns diligently and my academic background is undeniably aesthetic
4.30+
3+ Reviews
10+ Question Solved
Related Book For
Artificial Intelligence Structures And Strategies For Complex Problem Solving
ISBN: 9780321545893
6th Edition
Authors: George Luger
Question Posted:
Students also viewed these Computer science questions
-
What happens if the temporal difference algorithm of Problem 13 plays tic-tac-toe against itself? Data from problem 13 Consider the tic-tac-toe example of Section 10.7.2. Implement the temporal...
-
This role play will be about an interview with the client mentioned in the case scenario. Word limit will be around 500 words. Please make sure to cover the following points. In your role playas, you...
-
Case Study: Quick Fix Dental Practice Technology requirements Application must be built using Visual Studio 2019 or Visual Studio 2017, professional or enterprise. The community edition is not...
-
On April 1 of the current taxable year, Mr. Lasing Gho died leaving Php 25, 000, 000 of net distributable estate. He also left behind Tessie, his legitimate wife; Rhealyn, his legally adopted...
-
Find the tension in the two wires supporting the traffic light shown in Fig 9-46. 37 53 33 kg
-
When it sends out its fundraising letter, a philanthropic organization typically gets a return from about 5% of the people on its mailing list. To see what the response rate might be for future...
-
Do the following activities to complete your marketing plan: 1. Draw a simple organizational chart for your organization. 2. Develop a Gantt chart (see Chapter 2) to schedule the key activities...
-
As the production planner for Scott Sampson Products, Inc., you have been given a bill of material for a bracket that is made up of a base, two springs, and four clamps. The base is assembled from...
-
There are two primary reasons that guide my approach to back up files externally. First, there's the aspect of cost savings. Investing in a hard drive with sufficient storage capacity constitutes a...
-
Write a program that implements the fuzzy controller of Section 9.2.2. Data from section 9.2.2 There are two assumptions that are essential for the use of formal set theory. The first is with respect...
-
Analyze Samuels checker playing program from a reinforcement learning perspective. Sutton and Barto (1998, Section 11.2) offer suggestions in this analysis.
-
Consider the two-phase synchronous machine of Problem 4.22. Derive an expression for the torque acting on the rotor if the rotor is rotating at constant angular velocity, such that 0 = wt + , and the...
-
It is recommended that you practice using familiar everyday resources to find creative solutions because: A. The more you practice, the faster your brain gets at associating creatively. B. Due to...
-
(i) (5 marks) Let F(x, y, z) = (sin(y), x cos(y) + cos(z), -ysin (z)). Let C be the curve given by Y130 CUHK Examination ENGGH130 CUHK Examination ENGGI130 CUIK Examin CG1130 CUHK Examination ANGG130...
-
Need help please TB MC Qu. 5-96 (Static) You want to invest... You want to make five annual payments of $6,000 beginning now in order to accumulate $35,000 for a down payment on a house in five...
-
If Amazon is considering investing $10 billion in new warehouses, and the NPV is calculated to be +$1, and the WACC is 6%, what is your approximate estimate for the IRR? (No calculations needed.)
-
A young couple, both 25 years old, are planning to retire in 40 years at the age of 65. After they retire, they expect to live for an additional 20 years, until age 85. They plan to begin saving for...
-
Discuss the rationale behind the use of the equity method for an investment in common stock.
-
A heat engine has a heat input of 3 Ã 104 Btu/h and a thermal efficiency of 40 percent. Calculate the power it will produce, in hp. Source 3 x 10 Btu/h 40% HE Sink
-
What does the amplitude of a signal measure? What does the frequency of a signal measure? What does the phase of a signal measure?
-
What is the relationship between period and frequency?
-
If there is a single path between the source host and the destination host, do we need a router between the two hosts?
-
Below is a risk matrix for an entrepreneurial project in air conditioning installation and maintenance: | Risk | Probability | Impact | Severity | |--------|--------------|---------|-----------| |...
-
What is the substituted judgment standard for surrogate decision-making? b) According to Post and Blustein, why is this standard "the most problematic"?
-
An OIG fraud project found that during one month in a single state, there were 23,000 billings for an E/M service with the modifier 25 reported with one of these CPT codes: 11055, 11056, 11057, and...
Study smarter with the SolutionInn App