Question: Q - Learning Let's simulate the Q - learning algorithm! Assume there are states 0 , 1 , 2 , 3 and actions ( b
QLearning
Let's simulate the learning algorithm! Assume there are states and actions c and discount factor Furthermore, assume that all the values are initialized to and that the learning rate
Each row, in the table represents a record of experience at time :
In each row indicate what update will be made by the learning algorithm based on Note that is on the next row you might need to look ahead to the next part of the problem to see that next state value. You will want to keep track of the overall table as these updates take place, spanning the multiple parts of this question.
As a reminder, the learning update formula is the following:
You are welcome to do this problem by hand, though writing a small program to solve may be a good idea. To help with that, here is a variable with the history of experience:
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
