Question: Math: An alternative learning algorithm [ 1 0 points ] Consider a learning algorithm which at - tempts to learn a Q - function, but
Math: An alternative learning algorithm points Consider a learning algorithm which at
tempts to learn a Qfunction, but instead of using the usual Qlearning target
it uses as target a mixture of
where is a hyperparameter.
Assume that is an greedy policy derived from and the episodes used for training are collected
using only.
a points Recall that an onpolicy control algorithm estimates for the current be
haviour policy and for all states and actions Is this algorithm onpolicy or offpolicy?
Justify your answer.
b points For different values of how would you expect this algorithm to perform com
pared to Qlearning and SARSA? Include bias, variance, and maximization bias in your
discussion.
c points Bonus question: try this algorithm on the Taxi Problem in Question and compare
it to the other algorithms. Are the results consistent with your hypothesis?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
