Question: Answer Part 2 Plz: REINFORCE: Monte - Carlo Policy - Gradient Control ( episodic ) for * Input: a differentiable policy parameterization ( a |

Answer Part 2 Plz: REINFORCE: Monte-Carlo Policy-Gradient Control (episodic) for *
Input: a differentiable policy parameterization (a|s,)
Algorithm parameter: step size >0
Initialize policy parameter inRd'(e.g., to 0)
Loop forever (for each episode):
Generate an episode S0,A0,R1,dots,ST-1,AT-1,RT, following (*|*,)
Loop for each step of the episode t=0,1,dots,T-1 :
Glarrk=t+1Tk-t-1Rk
larr+tGgradln(At|St,)
Assume the agent uses REINFORCE for learning a policy while navigating in a continuous 2D square maze, with center at origin. It starts at the state (0,0). The agent's policy is
parameterized by a linear function where the final layer outputs the mean action Ws==(x,y). Here, WinR2x2 is a 22 matrix initialized as all zeros, and s is the state.
During execution, the agent then samples an action (ax,ay)N(,I), a 2-dimensional Gaussian distribution with mean and identity variance. The first trajectory is:
s0=(0,0),a0=(.5,-.2),s1=(1,-.2),a1=(.2,.1),s2=(1.2,-.1)dots,s5=(3.2,1.3). The trajectory ends in s5 because the agent falls into a trap and receives a
negative reward of -1(R(s4,a4)=-1). Otherwise, the agent receives a reward of 0 for every previous step. Assume =0.9.
What is the return G at state s0? Please specify to the 4 th decimal place.
Feedback
Based on answering correctly
G=t=04tR(st,at)=4*-1=-0.6561
0 points earned
Assume learning rate =0.1, what will be the sum of all the elements in the updated W right after we loop over the first state s0(so we have just updated W based on the
return from state s0 of this episode and haven't updated the parameters based on s1 yet.
Please refer to the REINFORCE algorithm on page 328 of the textbook)? Please write the answer to the 4th decimal place.
Hint: recall the formula of multivariate Gaussian, pdf=1(2)d||2exp(-12(x-)T-1(x-)), where d is the dimension.
Answer Part 2 Plz: REINFORCE: Monte - Carlo

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!