Question: if you cant answer all three, please answer the third question (b) (12 points) The MDP formulation we studied in class included a reward function

if you cant answer all three, please answer the third question
if you cant answer all three, please answer the third question (b)

(b) (12 points) The MDP formulation we studied in class included a reward function R(s, a, ') where the reward depends on the triple current state, action, and the outcome state. i. Show how an MDP with reward function R(s, a, s') can be transformed into a different MDP with reward function R(s.a) such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. 1 ii. Now, do the same to convert your MDP with R(s, a) into the MDP with R(s) such that the correspondence between optimal policies in the two MDPs is obtained. iii. Prove that the value of any fixed policy varies linearly with R(s). (b) (12 points) The MDP formulation we studied in class included a reward function R(s, a, ') where the reward depends on the triple current state, action, and the outcome state. i. Show how an MDP with reward function R(s, a, s') can be transformed into a different MDP with reward function R(s.a) such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. 1 ii. Now, do the same to convert your MDP with R(s, a) into the MDP with R(s) such that the correspondence between optimal policies in the two MDPs is obtained. iii. Prove that the value of any fixed policy varies linearly with R(s)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!