Question: if you cant answer all three, please answer the third question (b) (12 points) The MDP formulation we studied in class included a reward function
(b) (12 points) The MDP formulation we studied in class included a reward function R(s, a, ') where the reward depends on the triple current state, action, and the outcome state. i. Show how an MDP with reward function R(s, a, s') can be transformed into a different MDP with reward function R(s.a) such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. 1 ii. Now, do the same to convert your MDP with R(s, a) into the MDP with R(s) such that the correspondence between optimal policies in the two MDPs is obtained. iii. Prove that the value of any fixed policy varies linearly with R(s). (b) (12 points) The MDP formulation we studied in class included a reward function R(s, a, ') where the reward depends on the triple current state, action, and the outcome state. i. Show how an MDP with reward function R(s, a, s') can be transformed into a different MDP with reward function R(s.a) such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. 1 ii. Now, do the same to convert your MDP with R(s, a) into the MDP with R(s) such that the correspondence between optimal policies in the two MDPs is obtained. iii. Prove that the value of any fixed policy varies linearly with R(s)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
