Question: Problem 2 (The Value Functions) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 3.5 in the book by (Sutton

 Problem 2 (The Value Functions) (40 pts): Follow the notations given

Problem 2 (The Value Functions) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 3.5 in the book by (Sutton and Barto), answer the following questions. (a) Give an equation for q in terms of the transition probability p(s,rs,a) and the value function v. (Hint: The action value function, q(s,a), is the expected return of taking action a at state s (and follow policy thereafter). The agent may receive a random (immediate) reward r and reach a random next state s, whose value is given by v(s).) (b) Give an equation for v in terms of q and . (Hint: Use the result in part (a), and the Bellman equation for v ) (c) Derive the Bellman equation for q. That is, express q(s,a) using q(s,a). Rearrange the expression such that the summations is next to each other, like the expression in Bellman equation for v. (Hint: Use the results from part (a) and part (b). Make sure to use the notation a as the action taken in the next state s.)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!