Question: Problem 2 . ( 3 0 pt ) Given a Markov stationary policy , we studied the minimization of the projected Bellman error for policy
Problem pt Given a Markov stationary policy we studied the minimization of the
projected Bellman error for policy evaluation via function approximation. Alternatively, we
can choose the objective function as
;
where is the stationary distribution of the Markov chain induced by and ; is
the approximation of with the parameter Then, the gradient of with respect to
is given by
gradJ;;
To find approximating we can apply the stochastic gradient method according to
;;
where inS denotes the current state at stage
a Show that with direct parametrization, ie the update reduces to
step TD learning algorithm if we use ~~
Monte Carlo method if we use ~~ where
return update if we use ~~
Recall the indicator function in these nonparametric updates.
b The direct parameterization can be viewed as linear function approximation with the fea
ture matrix What if we have the feature matrix
where and hat We have and is full column rank. Formulate the
counterparts of step TD learning, Monte Carlo method, and return algorithms based on
under linear function approximation according to the feature matrix
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
