Question: Use reinforcement learning to solve this problem. 1. Consider the 3x3 wumpus world shown below. The goal of this simplified game is to be collocated

Use reinforcement learning to solve this problem.

Use reinforcement learning to solve this problem. 1. Consider the 3x3

1. Consider the 3x3 wumpus world shown below. The goal of this simplified game is to be collocated with the gold (where we get a +1000 reward) and not collocated with the wumpus (or we get a -1000 reward). All other states have a reward of. As before, the agent starts in (1,1), but has only four possible actions: Up, Down, Left, Right (there is no orientation or turning). Each of these actions has only an 80% chance of moving the agent in that direction, a 10% chance of moving 90 degrees left, and a 10% chance of moving 90 degrees right. For example, executing Up in location [1,1] would have an 80% chance of moving up to location [1,2], a 10% chance of moving left and staying in location [1,1] (i.e., a bump), and a 10% chance of moving right to location [2,1 We will use reinforcement learning to solve this problem. +1000 1000 a. Compute the utility U(s) of each non-terminal state s given the policy shown above. Note that [1,2] and [13] are terminal states, where U([1,2],--1000, and U([13],- +1000. You may assume -1. b. Compute the Q values for Q([ 1,1 ],Right), Q([2,1 ],Right), Q([3,1],Up), Q([3,2],Up), Q(3,3,Left, and Q(2,3,Left), after each of ten executions of the action sequence Right, Right, Up, Up, Left, Left (starting from1, for each sequence). You may assume =1, 1, and all Q values for non-terminal states are initially zero. 1. Consider the 3x3 wumpus world shown below. The goal of this simplified game is to be collocated with the gold (where we get a +1000 reward) and not collocated with the wumpus (or we get a -1000 reward). All other states have a reward of. As before, the agent starts in (1,1), but has only four possible actions: Up, Down, Left, Right (there is no orientation or turning). Each of these actions has only an 80% chance of moving the agent in that direction, a 10% chance of moving 90 degrees left, and a 10% chance of moving 90 degrees right. For example, executing Up in location [1,1] would have an 80% chance of moving up to location [1,2], a 10% chance of moving left and staying in location [1,1] (i.e., a bump), and a 10% chance of moving right to location [2,1 We will use reinforcement learning to solve this problem. +1000 1000 a. Compute the utility U(s) of each non-terminal state s given the policy shown above. Note that [1,2] and [13] are terminal states, where U([1,2],--1000, and U([13],- +1000. You may assume -1. b. Compute the Q values for Q([ 1,1 ],Right), Q([2,1 ],Right), Q([3,1],Up), Q([3,2],Up), Q(3,3,Left, and Q(2,3,Left), after each of ten executions of the action sequence Right, Right, Up, Up, Left, Left (starting from1, for each sequence). You may assume =1, 1, and all Q values for non-terminal states are initially zero

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Read below and look around at your organization, whether your school or workplace. What three ideas can you come up with right away for possible innovations? How would your ideas, if implemented,...

Discuss fully the future trends that will affect training. choose four only. Part 4 Social Responsability and the Future Training for Sustainability Sustainability refers to a company's ability to...

A creative engineer suggests structuring the TLB so that not all the bits of the presented address need match to result in a hit. Suggest how this might be achieved, and what might be the costs and...

5/21/2016 University of Phoenix: Management PRINTED BY: cherylesowell2012@email.phoenix.edu. Printing is for personal, private use only. No part of this book may be reproduced or transmitted without...

CHAPTER 1 THE BUSINESS AND SOCIETY RELATIONSHIP BUSINESS & SOCIETY Title ISBN Business and Society Archie B. Carroll; Ann K. Buchholtz 978-1-285-73429-3 Publisher Cengage Learning Author FOCUS OF THE...

Please read the question Question : What strategies have you used to communicate in a language you were acquiring? What strategies do you think emergent bilinguals use? 3 How Do People Learn and How...

MATHEMATICIANS RISE TO A CHALLENGE ne of the theorems we teach in eighth grade is a + b= *, where c is the length of the hypotenuse of a right triangle in Euclidean space, and a and b are the lengths...

Please read the question Question: Choose another different one of the teaching strategies from the article that you read 1. What is the teacher is trying to accomplish by using this technique? (That...

Please read the question Question: Choose one of the teaching strategies from the article that you read 1. What is the teacher is trying to accomplish by using this technique? (That is, what's the...

OPERATIONS MANAGEMENT ASSIGNMENT 6 1 Human resources, project management and operations management are all equally vital to a business's success. Each of these focuses on different areas of the...

Projected financial results for the universitys cafeteria for meals sold for next year are shown below. Answer each of the following independent questions. (a) How much are the contribution margin...

More music: Refer to Exercise 13. Although physical formats sell fewer units than digital formats, their retail value is higher CDs typically sell for $ 15 or more, while a download single typically...

The expected return on the market portfolio is 1 5 % . The risk - free rate is 8 % . The expected return on SDA Corporation common stock is 1 6 % . The beta of SDA Corporation common stock is 1 . 2 5...

Suppose Capital One is advertising a 60 -month, 5.34% APR motorcycle loan. If you need to borrow $9,400 to purchase your dream Harley-Davidson, what will be your monthly payment? (Note: Be careful...

From a Comparable Worth Standpoint, what is the situation with regard to Federal Gender-based Employee Pay Equity?

Provide an example of how drilling down further into information can yield new results.

What do Dimensions represent in OLAP Cubes?