Question: In the TD ( lambda ) algorithm, zt is computed recursively. Express zt only in terms of the states visited in the past. This

In the TD(\lambda ) algorithm, zt
is computed recursively. Express zt only in terms of the states
visited in the past. This representation of the eligibility vector will show that eligibility
vectors combine the frequency heuristic and recency heuristic to address the credit assignment problem. For the rewards received, the frequency heuristic assigns higher credit to
the frequently visited states while the recency heuristic assigns higher credit to the recently
visited states. The eligibility vector assigns higher credits to the frequently and recently
visited states.
Note that in the TD(\lambda ) algorithm, value function estimate for every state gets updated different
from the n-step TD algorithms, where only the estimate for the current state gets updated. If
a state has not been visited recently and frequently then the eligibility of that state (i.e., the
associated entry of the eligibility vector) will be close to zero. Therefore, the update via the
TD-error will take very small steps for such states.
Though \lambda -return is forward-looking while TD(\lambda ) is backward looking, they are equivalent
as you will show next for the finite horizon problem with horizon length T <\infty .

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!