Question: Problem 2 (The Value Functions) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 3.5 in the book by (Sutton
Problem 2 (The Value Functions) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 3.5 in the book by (Sutton and Barto), answer the following questions. (a) Give an equation for q in terms of the transition probability p(s,rs,a) and the value function v. (Hint: The action value function, q(s,a), is the expected return of taking action a at state s (and follow policy thereafter). The agent may receive a random (immediate) reward r and reach a random next state s, whose value is given by v(s).) (b) Give an equation for v in terms of q and . (Hint: Use the result in part (a), and the Bellman equation for v ) (c) Derive the Bellman equation for q. That is, express q(s,a) using q(s,a). Rearrange the expression such that the summations is next to each other, like the expression in Bellman equation for v. (Hint: Use the results from part (a) and part (b). Make sure to use the notation a as the action taken in the next state s.)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
