Question: We have a policy parameterized by a scalar parameter 8. We want to estimate the gradient at 8 = 5 using the regression gradient

We have a policy parameterized by a scalar parameter 8. We want to estimate the gradient at 8 = 5 using the regression gradient method with a perturbation matrix A = [-1,-0.5, 0.5, 0.5, 0.5, 1]. We do rollouts with these perturbations and get AU = [-1,-1, 1, 1,-1, 1]. What is our estimate of the gradient?
Step by Step Solution
3.46 Rating (149 Votes )
There are 3 Steps involved in it
Estimate of the gradient is we have 85 DIJ 11 ... View full answer
Get step-by-step solutions from verified subject matter experts
