Question: Problem 1 (Optimal Value Functions and Policies) (20 pts): In this problem, we will practice/review the relations between optimal value functions and how to derive
Problem 1 (Optimal Value Functions and Policies) (20 pts): In this problem, we will practice/review the relations between optimal value functions and how to derive optimal policies from optimal value functions. Follow the notations given in the lecture note, or alternatively from Chapter 3 in the book by (Sutton and Barto), answer the following questions. (a) Give an equation for q in terms of the transition probability p(s,rs,a) and the optimal value function v. (Hint: Recall that we have derived the equation for q in terms of the transition probability p and v. What if now we follow the optimal policy instead of just any policy , starting from next state s ?) (b) Give an equation for v in terms of q. (Hint: Use the result in part (a), and the Bellman optimality equation.) (c) Given an equation for the optimal policy (s) in terms of the transition probability p(s,rs,a) and v. For simplicity, we can just consider the deterministic optimal policy here (that is, (s) is one action in each state s ). (Hint: Start from the Bellman optimality equation for v ) (d) Give an equation for the optimal policy (s) in terms of q. Again, consider the deterministic policy case. (Hint: Combine the results in part (a) and (c), or use the result from part (b) directly.)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
