Question: Exercise 11.9 Consider four different ways to derive the value of k from k in Qlearning (note that for Q-learning with varying k, there must

Exercise 11.9 Consider four different ways to derive the value of αk from k in Qlearning

(note that for Q-learning with varying αk, there must be a different count k for each state–action pair).

i) Let αk = 1/k.

ii) Let αk = 10/(9 + k).

iii) Let αk = 0.1.

iv) Let αk = 0.1 for the first 10,000 steps, αk = 0.01 for the next 10,000 steps,

αk = 0.001 for the next 10,000 steps, αk = 0.0001 for the next 10,000 steps, and so on.

(a) Which of these will converge to the true Q-value in theory?

(b) Which converges to the true Q-value in practice (i.e., in a reasonable number of steps)? Try it for more than one domain.

(c) Which can adapt when the environment adapts slowly?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Management And Artificial Intelligence Questions!