Question: 1. Consider a dynamic program, in the Bellman equation we consider V,;(wt) = max r($t,at) + + T(37T1aaT1) + W531) Gt,...,aT_1 i) Write out and

1. Consider a dynamic program, in the Bellman equation we consider V,;(wt) = max r($t,at) + + T(37T1aaT1) + W531") Gt,...,aT_1 i) Write out and prove the Bellman equation for 141$) . However, suppose that we consider Ut(a:t) = max r(a:0,a0) + r(x1, a1) + + r(3:t_1,at_1) a0:"'1a't1 where so is xed and it is assume that 3:, is the next state after taking action at_1 from state m,_1 . (If no such solution from $0 to 1:, in t steps exists then we set Ut(t) = 00 ) ii) Show that Utfl't) = max {Ur1($t1) '1' \"17131, Ctr1)} manatiumei.Gt71)=$t and U0 (3:0) = 0 . [This approach to solving a dynamic program is sometimes referred to as Forward Dynamic Program, because the iteration proceed forward from their initial state 3:0 .] iii) Argue that we cannot apply this forward dynamic programming approach to MDPs
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
