Question: Assumed you have the input policy phi, A C E B Assume: y = 1 And your observed episodes (training) are shown as below:

Assumed you have the input policy phi, A C E B Assume:  

Assumed you have the input policy phi, A C E B Assume: y = 1 And your observed episodes (training) are shown as below: Episode 1 Episode 2 B, east, C, -1 C, east, D, -1 D, exit, x, +10 Episode 3 D E, north, C, -1 C, east, D, -1 D, exit, x, +10 B, east, C, -1 C, east, D, -1 D, exit, x, +10 Episode 4 E, north, C, -1 C, east, A, -1 A, exit, x, -10 Please calculate the learned models (s,a,s') and (s,a,s') below: For (s, a, s'), please calculate: T(B, east, C), T(C, east, E), T(C, east, D) For (s, a,s'), please calculate: R(B, east, C), R(C, east, D) R(C, east, A), R(D, exit, A)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!