Question: Exercise 11.9 Consider four different ways to derive the value of k from k in Qlearning (note that for Q-learning with varying k, there must

Exercise 11.9 Consider four different ways to derive the value of αk from k in Qlearning

(note that for Q-learning with varying αk, there must be a different count k for each state–action pair).

i) Let αk = 1/k.

ii) Let αk = 10/(9 + k).

iii) Let αk = 0.1.

iv) Let αk = 0.1 for the first 10,000 steps, αk = 0.01 for the next 10,000 steps,

αk = 0.001 for the next 10,000 steps, αk = 0.0001 for the next 10,000 steps, and so on.

(a) Which of these will converge to the true Q-value in theory?

(b) Which converges to the true Q-value in practice (i.e., in a reasonable number of steps)? Try it for more than one domain.

(c) Which can adapt when the environment adapts slowly?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Management And Artificial Intelligence Questions!

Q:

Article review: see attached Abstract: One paragraph Literature Review: a brief no more than one page discussion of the important (top three articles) literature and the findings of that literature....

Q:

) Consider integer division of one two's-complement binary number by another. Programming languages may vary in the result when one argument is negative. What differing conventions might they be...

Q:

Find attached Ingredients: Water, MCC, Salt, Nicotine, pH regulator, sweeteners and flavours I am wondering why the decision to go with LYFT. I understand migrating to LYFT would be easier from sells...

Q:

5. Consider four different ways to derive the value of k from k in Q-learning (note that for Qlearning with varying k, there must be a different count k for each stateaction pair). (a) Let k = 1/k....

Q:

Consider four different ways to derive the value of k from k in Qlearning (note that for Q-learning with varying k, there must be a different count k for each stateaction pair). (i) Let k = 1/k. (ii)...

Q:

\fCHAPTER 14 Server Farms: M/M/k and M/M/k/k In today's high-volume world, almost no websites, compute centers, or call centers consist of just a single server. Instead a \"server farm\" is used. The...

Q:

I need a 10 page paper for my MIS class. Please do not copy and paste as my school is getting stricter on plagiarism. I have attached the assignment and the sample \fData Analytic Thinking 1 Data...

Q:

1 2.3 Definition of a Discrete Probability Function Definition: Let S be a discrete sample space from some experiment. A function P, defined on all events in S, is said to be a probability function...

Q:

Portray in words what transforms you would have to make to your execution to some degree (a) to accomplish this and remark on the benefits and detriments of this thought.You are approached to compose...

Q:

Please read chapter 6 and answer the questions and see the ( guide to answer number 3) For each case study, you will view the material as the student's teacher, read the information provided and...

Q:

Suppose you have decided to buy flashlights for your car. You decided to purchase for 20,100 TL where the base time period is zero (0) and assume the expected useful life will be 6 years, and before...

Q:

b. Dimethyl ether (DME) production process flow diagram without safety reliefs is shown in APPENDIX. The process is intended to produce 50,000 metric tons DME annually with 99.5 wt% purity. An audit...

Q:

Logistics and supply chain management The following information has been extracted from the financial statements of a company. Use it to answer the 4 questions that follow it . When answering the...

Q:

In the citation Schusters Express, Inc., 66 T.C. 588 (1976), affd 562 F.2d 39 (CA2, 1977), nonacq., to what do the 66, 39, and nonacq. refer?

Recommended Textbook

More Books

Artificial Intelligence Foundations Of Computational Agents

Authors: David L. Poole, Alan K. Mackworth

1st Edition

0521519004, 978-0521519007

Ask a Question and Get Instant Help!