Question: Problem 1 (Optimal Value Functions and Policies) (20 pts): In this problem, we will practice/review the relations between optimal value functions and how to derive

 Problem 1 (Optimal Value Functions and Policies) (20 pts): In this

Problem 1 (Optimal Value Functions and Policies) (20 pts): In this problem, we will practice/review the relations between optimal value functions and how to derive optimal policies from optimal value functions. Follow the notations given in the lecture note, or alternatively from Chapter 3 in the book by (Sutton and Barto), answer the following questions. (a) Give an equation for q in terms of the transition probability p(s,rs,a) and the optimal value function v. (Hint: Recall that we have derived the equation for q in terms of the transition probability p and v. What if now we follow the optimal policy instead of just any policy , starting from next state s ?) (b) Give an equation for v in terms of q. (Hint: Use the result in part (a), and the Bellman optimality equation.) (c) Given an equation for the optimal policy (s) in terms of the transition probability p(s,rs,a) and v. For simplicity, we can just consider the deterministic optimal policy here (that is, (s) is one action in each state s ). (Hint: Start from the Bellman optimality equation for v ) (d) Give an equation for the optimal policy (s) in terms of q. Again, consider the deterministic policy case. (Hint: Combine the results in part (a) and (c), or use the result from part (b) directly.)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!