Question: MDPs can be formulated with a reward function R ( s , a ) that depends on the action taken or with a reward function
MDPs can be formulated with a reward function Rs a that depends on the action taken
or with a reward function Rs a s that also depends on the outcome state.
a Write the Bellman equations for these formulations.
b Show how an MDP with reward function Rs a s can be transformed into a different
MDP with reward function Rs a such that optimal policies in the new MDP correspond
exactly to optimal policies in the original MDP
c Now do the same to convert MDPs with Rs a into MDPs with Rs
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
