Question: Problem 1 . ( 5 0 pt ) Given a Markov stationary policy pi , consider the policy evaluation problem to compute v

Problem 1.(50pt) Given a Markov stationary policy \pi , consider the policy evaluation problem to compute v
\pi
. For example, we can apply the temporal difference (TD) learning algorithm given by v
t+1
(s)=v
t
(s)+\alpha \delta
t
(s)I
{s
t
=s}
, where \delta
t
:=r
t
+\gamma v
t
(s
t+1
)v
t
(s
t
) is known as TD error. Alternatively, we can apply the n-step TD learning algorithm given by v
t+1
(s)=v
t
(s)+\alpha (G
t
(n)
v
t
(s))I
{s
t
=s}
, where G
t
(n)
:=r
t
+\gamma r
t+1
+...+\gamma
n1
r
t+n1
+\gamma
n
v
t
\pi
(s
t+n
) for n=1,2,... Note that \delta
t
= G
t
(1)
v
t
(s
t
). The n-step TD algorithms for n<\infty use bootstrapping. Therefore, they use biased estimate of v
\pi
. On the other hand, as n->\infty , the n-step TD algorithm becomes a Monte Carlo method, where we use an unbiased estimate of v
\pi
. However, these approaches delay the update for n stages and we update the value function estimate only for the current state. As an intermediate step to address these challenges, we first introduce the \lambda -return algorithm given by v
t+1
(s)=v
t
(s)+\alpha (G
t
\lambda
v
t
(s))I
{s
t
=s}
, where given \lambda in [0,1], we define G
t
\lambda
:=(1\lambda )
n=1
\infty
\lambda
n1
G
t
(n)
taking a weighted average of G
t
(n),s.
(a) By the definition of G
t
(n)
, we can show that G
t
(n)
=r
t
+\gamma G
t+1
(n1)
. Derive an analogous recursive relationship for G
t
\lambda
and G
t+1
\lambda
.(b) Show that the term G
t
\lambda
v
t
(s) in the \lambda -return update can be written as the sum of TD errors. The TD algorithm, Monte Carlo method and \lambda -return algorithm looks forward to approximate v
\pi
. Alternatively, we can look backward via the eligibility trace method. TheTD(\lambda )

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!