Question: Consider applying the Q learning algorithm to the same grid world as in Problem 1. Assume that the table of q values is initialized to

Consider applying the Q learning algorithm to the same grid world as in Problem 1. Assume that the table of q values is initialized to 0. Assume the agent begins in State S7 and then travels clockwise around the perimeter of the grid until it reaches the absorbing goal state, completing the first training episode. Assume that = 0.8 and that = 1.

(a) Determine which q(, ) values are modified as a result of this episode, and give their revised values.

(b) Assume that the agent now performs a second identical episode. Determine which q(, ) values are modified as a result of this episode, and give their revised values.

(c) Assume that the agent now performs a third identical episode. Determine which q(, ) values are modified as a result of this episode, and give their revised values.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

A certain program has to maintain an array, count, of N counters which are all initialised to zero. The value of counter i can be incremented by one by the call: increment(i), and this is the only...

ion: Consider the following rules " If one is drunk or sick then he/she is not sober. Further, assume the following facts concerning the respective people: "Tony is sober" "Tom is not sober" "Esther...

part 1 Please list the scenarios you played. 1. students need to play 2 scenarios. For grading purposes, I will select the best two scenarios for graduate students and the best one for undergraduate...

please answer all parts and show work so that I may learn the process! Consider Pacman that uses MDPs to maximize his expected utility. In each environment: - Pacman has the standard actions (North,...

1 Exercise 3: Lift and Airfoils The first part of this week's assignment is to choose and research a reciprocating engine powered (i.e. propeller type) aircraft. You will further use your selected...

Portray in words what transforms you would have to make to your execution to some degree (a) to accomplish this and remark on the benefits and detriments of this thought.You are approached to compose...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

When we value the firm, therefore, we consider cash flows to all of these claim holders. We define the free cash flow to the firm as being the cash flow left over after operating expenses, taxes, and...

A creative engineer suggests structuring the TLB so that not all the bits of the presented address need match to result in a hit. Suggest how this might be achieved, and what might be the costs and...

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

Use the adjusted trial balance for the Stockton Company below to answer the question that follow. Stockton company Adjusted trial balance December 31 Money 7,530 accounts receivable 2100 Prepaid...

The market for the new Banana Jr 9000 computer is represented by the demand and supply equce in Dollars and Q represents the Quantity of computers. P=30000-2*Q P=1000+0.5*Q What is the size of the...

Which of the following statements regarding serial payments and fixed annuity payments is CORRECT? A ) Serial payments are similar to fixed annuities in that they both pay out a fixed amount each...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

3. Are these strategies used constructively to enhance organizational goal attainment? Are these strategies used for self-serving purposes? Explain.

4. How comfortable are you with introducing yourself to people? What kind of impression do you think you give others? (You may want to check your self-assessment by asking a trusted source what kind...

3. Have you ever felt good or been flattered by someone seeking to network with you? What did the person do to make you feel good about the interaction?