Question: User Consider the car domain above ( without knowing the T or R ) and given the following experiences: Episode 1 : cool, fast, warm,

User
Consider the car domain above (without knowing the T or R ) and given the following experiences:
Episode 1:
cool, fast, warm, +2
warm, fast, overheated, -10
Episode 2:
cool, slow, cool, +1
cool, slow, cool, +1
cool, fast, cool, +2
cool, fast, cool, +2
cool, fast, warm, +2
warm, fast, overheated, -10
Episode 3:
cool, fast, warm, +2
warm, slow, cool, +1
cool, slow, cool, +1
cool, fast, cool, +2
cool, fast, warm, +2
warm, fast, overheated, -10
c. Assuming that the initial state values are all zeros, compute the updates in TD learning
for policy evaluation (passive RL) to the V function after running through episodes 1-3 in
sequence (the episodes follow the policy to be evaluated). Show steps for a =0.5 and g =1.0.
d. Assuming that the initial Q values are all zeros, compute the updates in Q learning
(active RL) to the Q values after running through episodes 1-3 in sequence. Show steps for a =
0.5 and g =1.0.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!