Question: Consider applying the Q learning algorithm to the same grid world as in Problem 1. Assume that the table of q values is initialized to
Consider applying the Q learning algorithm to the same grid world as in Problem 1. Assume that the table of q values is initialized to 0. Assume the agent begins in State S7 and then travels clockwise around the perimeter of the grid until it reaches the absorbing goal state, completing the first training episode. Assume that = 0.8 and that = 1.
(a) Determine which q(, ) values are modified as a result of this episode, and give their revised values.
(b) Assume that the agent now performs a second identical episode. Determine which q(, ) values are modified as a result of this episode, and give their revised values.
(c) Assume that the agent now performs a third identical episode. Determine which q(, ) values are modified as a result of this episode, and give their revised values.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
