Question: We have a policy parameterized by a scalar parameter 8. We want to estimate the gradient at 8 = 5 using the regression gradient

We have a policy parameterized by a scalar parameter 8. We want

We have a policy parameterized by a scalar parameter 8. We want to estimate the gradient at 8 = 5 using the regression gradient method with a perturbation matrix A = [-1,-0.5, 0.5, 0.5, 0.5, 1]. We do rollouts with these perturbations and get AU = [-1,-1, 1, 1,-1, 1]. What is our estimate of the gradient?

Step by Step Solution

3.46 Rating (149 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Estimate of the gradient is we have 85 DIJ 11 ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!