Question: Open the image attached for the qustion [25pts] [Non-programming problem] When we describe the Markov Decision Processes, we derived a formula to calculate the state

Open the image Open the image attached for the qustion [25pts] [Non-programming problem] When we attached for the qustion

[25pts] [Non-programming problem] When we describe the Markov Decision Processes, we derived a formula to calculate the state value function for a policy . Please derive the formula for the action value function for a policy. The following specifies the definition of state value function (Eq 3.12) and the definition of action value function (Eq 3.13). Gtk=t+1Tkt1Rkv(s)E[GtSt=s]=E[k=0kRt+k+1St=s],forallsS,q(s,a)E[GtSt=s,At=a]=E[k=0kRt+k+1St=s,At=a] We derived the following formula for v(s)E[GtSt=s]=E[Rt+1+Gt+1St=s]=a(as)srp(s,rs,a)[r+E[Gt+1St+1=s]]=a(as)s,rp(s,rs,a)[r+v(s)],forallsS, Please derive a formula, similar to Eq 3.14 , but for the action value function. [25pts] [Non-programming problem] When we describe the Markov Decision Processes, we derived a formula to calculate the state value function for a policy . Please derive the formula for the action value function for a policy. The following specifies the definition of state value function (Eq 3.12) and the definition of action value function (Eq 3.13). Gtk=t+1Tkt1Rkv(s)E[GtSt=s]=E[k=0kRt+k+1St=s],forallsS,q(s,a)E[GtSt=s,At=a]=E[k=0kRt+k+1St=s,At=a] We derived the following formula for v(s)E[GtSt=s]=E[Rt+1+Gt+1St=s]=a(as)srp(s,rs,a)[r+E[Gt+1St+1=s]]=a(as)s,rp(s,rs,a)[r+v(s)],forallsS, Please derive a formula, similar to Eq 3.14 , but for the action value function

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!