Question: Math: An alternative learning algorithm [ 1 0 points ] Consider a learning algorithm which at - tempts to learn a Q - function, but

Math: An alternative learning algorithm [10 points] Consider a learning algorithm which at-
tempts to learn a Q-function, but instead of using the usual Q-learning target R+maxaQ(s',a),
it uses as target a mixture of
R+((1-)maxaQ(s',a)+a?(s',a)Q(s',a))
where in(0,1) is a hyper-parameter.
Assume that is an lon-greedy policy derived from Q, and the episodes used for training are collected
using only.
(a)[5 points] Recall that an on-policy control algorithm estimates q(s,a) for the current be-
haviour policy and for all states s and actions a. Is this algorithm on-policy or off-policy?
Justify your answer.
(b)[5 points] For different values of , how would you expect this algorithm to perform com-
pared to Q-learning and SARSA? Include bias, variance, and maximization bias in your
discussion.
(c)[5 points] Bonus question: try this algorithm on the Taxi Problem in Question 1, and compare
it to the other algorithms. Are the results consistent with your hypothesis?
 Math: An alternative learning algorithm [10 points] Consider a learning algorithm

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!