Question: Assumed you have the input policy phi, A C E B Assume: y = 1 And your observed episodes (training) are shown as below:
Assumed you have the input policy phi, A C E B Assume: y = 1 And your observed episodes (training) are shown as below: Episode 1 Episode 2 B, east, C, -1 C, east, D, -1 D, exit, x, +10 Episode 3 D E, north, C, -1 C, east, D, -1 D, exit, x, +10 B, east, C, -1 C, east, D, -1 D, exit, x, +10 Episode 4 E, north, C, -1 C, east, A, -1 A, exit, x, -10 Please calculate the learned models (s,a,s') and (s,a,s') below: For (s, a, s'), please calculate: T(B, east, C), T(C, east, E), T(C, east, D) For (s, a,s'), please calculate: R(B, east, C), R(C, east, D) R(C, east, A), R(D, exit, A)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
