Question: 1 . When the learning rate is 0 . 1 , and the discount factor is 0 . 9 . Consider the following trajectories. (
When the learning rate is and the discount factor is Consider the following trajectories.
s north, sr;
s east, s r;
s east, s r ;
s north, s r ;
Use the Qlearning algorithm to help the agent update its function: Please list the values that have been changed after each of the four actions. Further, the updated values should be used in computing the value update of followup actions.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
