Question: if you cant answer all three, please answer the third question (b) (12 points) The MDP formulation we studied in class included a reward function

if you cant answer all three, please answer the third question

(b) (12 points) The MDP formulation we studied in class included a reward function R(s, a, ') where the reward depends on the triple current state, action, and the outcome state. i. Show how an MDP with reward function R(s, a, s') can be transformed into a different MDP with reward function R(s.a) such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. 1 ii. Now, do the same to convert your MDP with R(s, a) into the MDP with R(s) such that the correspondence between optimal policies in the two MDPs is obtained. iii. Prove that the value of any fixed policy varies linearly with R(s). (b) (12 points) The MDP formulation we studied in class included a reward function R(s, a, ') where the reward depends on the triple current state, action, and the outcome state. i. Show how an MDP with reward function R(s, a, s') can be transformed into a different MDP with reward function R(s.a) such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. 1 ii. Now, do the same to convert your MDP with R(s, a) into the MDP with R(s) such that the correspondence between optimal policies in the two MDPs is obtained. iii. Prove that the value of any fixed policy varies linearly with R(s)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

Firms and Markets Spring 2020 Prof. T. Pugel Tuesdays Midterm Exam Written answers to the exam questions are due by 6:00 PM on Tuesday, March 31, 2020. Please answer all questions. Total points for...

Read the case and answer Question 1 (3 parts)... Background and Situation Analysis Problem Identification (if any) Solution/Answer/Recommendation with proper justification (whenever and wherever...

answer the question clearly You are building a flight-control system for which a convincing safety case must be made. Would you assign the tasks of safety requirements engineering, test case...

Please read the question Question: Choose one of the teaching strategies from the article that you read 1. What is the teacher is trying to accomplish by using this technique? (That is, what's the...

Please read the question Question: Choose another different one of the teaching strategies from the article that you read 1. What is the teacher is trying to accomplish by using this technique? (That...

Criteria Exemplary 6 points Accomplishe d 4.8 points Developing 3.6 points Beginning Minimum Below Standards 2.4 points 1.2 points Formulated, wrote, interpreted, argued, and evaluated...

Math\t107-6381\t-\tQuiz\t#4\t-\tSchultz\t-\tDue\tFebruary\t21,\t2016\t-\tpage\t1\tof 3 Follow\tthese\tdirections\tcarefully. This\tquiz\tis\tdue\tby\t11:59\tEastern\ttime\ton\tFebruary\t21,\t2016. o...

Here is the new homework question. Please review the General Homework instructions due to the instructor being so strict. Were there any podcasts or Homework document notes that you needed me to...

Analysis of the Form 990 for Not-for-Profits (NFPs) Human rights watch: https://www.hrw.org/ Explore the concepts of the class material relating to NFPs and Form 990s. You must find and review / read...

There is a discussion and definition of both Ethnography (study of one particular culture) and Ethnology (comparison/contrasts between two or more cultures). Can you provide insights why a cultural...

In Exercises 29 and 30, verify that S and T are inverses. 1. S: R2 R2 defined by and T: R2 R2 defined by 2. S: P1 P1 defined by S(a + bx) = (- 4a + b) + 2ax and T: P1 P1 defined by T(a + bx) =...

8. If Option Strict is set to On, which of the following statements will assign the contents of the txtSales control to a Double variable named dblSales? a. dblSales = txtSales.Text b. dblSales =...

current risk-free rate is 5.51% and the market is expected to return 7.55% per year. The company's bete is 1.57 . The company expects to pay 4.9% for its debt. e target capital structure for the...