Question: 1 . When the learning rate is 0 . 1 , and the discount factor is 0 . 9 . Consider the following trajectories. (

1.When the learning rate is 0.1, and the discount factor is 0.9. Consider the following trajectories.
(s2, north, s3,r=-0.1);
(s3, east, s3, r=-0.1);
(s3, east, s4, r =-0.1);
(s4, north, s4, r =1.0);
Use the Q-learning algorithm to help the agent update its Q function: Q(s,a). Please list the Q values that have been changed after each of the four actions. Further, the updated Q value(s) should be used in computing the Q value update of follow-up actions.
1 . When the learning rate is 0 . 1 , and the

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!