Question: Problem 2 (The Value Functions) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 3.5 in the book by (Sutton

Problem 2 (The Value Functions) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 3.5 in the book by (Sutton and Barto), answer the following questions. (a) Give an equation for q in terms of the transition probability p(s,rs,a) and the value function v. (Hint: The action value function, q(s,a), is the expected return of taking action a at state s (and follow policy thereafter). The agent may receive a random (immediate) reward r and reach a random next state s, whose value is given by v(s).) (b) Give an equation for v in terms of q and . (Hint: Use the result in part (a), and the Bellman equation for v ) (c) Derive the Bellman equation for q. That is, express q(s,a) using q(s,a). Rearrange the expression such that the summations is next to each other, like the expression in Bellman equation for v. (Hint: Use the results from part (a) and part (b). Make sure to use the notation a as the action taken in the next state s.)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

Problem 2 (Policy Iteration Using Action Value Function) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 4 in the book by (Sutton and Barto), answer the...

Problem 3 (Value Iteration Using Action Value Function) ( 40pts ): Follow the notations given in the lecture note, or alternatively from Chapter 4 in the book by (Sutton and Barto), answer the...

Problem 1 (Optimal Value Functions and Policies) (20 pts): In this problem, we will practice/review the relations between optimal value functions and how to derive optimal policies from optimal value...

Follow the steps given in Machine Learning With R , Chapter 5, section "Example Identifying Risky Bank Loans Using C5.0 Decision Trees." download the credit. csv file from Packt Publishing's website...

1 2.3 Definition of a Discrete Probability Function Definition: Let S be a discrete sample space from some experiment. A function P, defined on all events in S, is said to be a probability function...

Responsibility Center Presentation Imagine you have been selected by your manager to present a training session to a group of new employees. The new hires do not have accounting backgrounds and have...

Needing ANSWERS ASAP! Starting at pg 34 - Labeled Graded Project 06155200: Graded Project Instructions & Worksheets 1 Lesson 1: Business, Accounting, and You PROJECT GOAL The goal of this graded...

UNIT 5 QZ, Q-20 UNIT 5 / UNIT 8 QUESTIONS KEEP ANSWERS SEPARATE BY UNITS UNIT II STUDY GUIDE Consumer Mathematics Reading Assignment See information below. Key Terms 1. Add-on interest method 2....

1 Exercise 3: Lift and Airfoils The first part of this week's assignment is to choose and research a reciprocating engine powered (i.e. propeller type) aircraft. You will further use your selected...

Label Each Question and Section: Read the Point/Counterpoint on page 534 of the text: Organizations Should Strive to Create Positive Organizational Culture Should organizations do everything they can...

One item is omitted from each of the following computations of the rate of return on investment: Determine the missing items, identifying each by the appropriateletter. Rate of return on investment =...

Keeling Printing Company uses a job order cost system. The following data summarize the operations related to production for April, the first month of operations: (a) Materials purchased on account,...

What term is used in us accounting that means creditors

5. Develop a scenario comparing two PH programs and involving the use of a CBA.

Some argue that outsourcing an activity is bad because the activity is no longer a means of distinguishing the firm from competitors. (All competitors can buy the same service from the same provider,...

Does it avoid use of underlining?

Does it have an employment objective that is specific and focuses on the employers needs as well as your own?