Question: Consider applying the Q learning algorithm to the same grid world as in Problem 1. Assume that the table of q values is initialized to

Consider applying the Q learning algorithm to the same grid world as in Problem 1. Assume that the table of q values is initialized to 0. Assume the agent begins in State S7 and then travels clockwise around the perimeter of the grid until it reaches the absorbing goal state, completing the first training episode. Assume that = 0.8 and that = 1.

(a) Determine which q(, ) values are modified as a result of this episode, and give their revised values.

(b) Assume that the agent now performs a second identical episode. Determine which q(, ) values are modified as a result of this episode, and give their revised values.

(c) Assume that the agent now performs a third identical episode. Determine which q(, ) values are modified as a result of this episode, and give their revised values.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!