Question: Q - Learning Let's simulate the Q - learning algorithm! Assume there are states 0 , 1 , 2 , 3 and actions ( b

Q-Learning
Let's simulate the Q-learning algorithm! Assume there are states 0,1,2,3 and actions (b','c), and discount factor =0.9. Furthermore, assume that all the Q values are initialized to 0 and that the learning rate =0.5.
Each row, t, in the table represents a record of experience at time t : (st,at,rt).
In each row t, indicate what update Q(st,at)larrq will be made by the Q learning algorithm based on (st,at,rt,st+1). Note that st+1 is on the next row (you might need to look ahead to the next part of the problem to see that next state value.) You will want to keep track of the overall table Q(st,at) as these updates take place, spanning the multiple parts of this question.
As a reminder, the Q-learning update formula is the following:
Q(s,a)=(1-)Q(s,a)+(r+maxa'Q(s',a'))
You are welcome to do this problem by hand, though writing a small program to solve may be a good idea. To help with that, here is a variable with the history of experience:
 Q-Learning Let's simulate the Q-learning algorithm! Assume there are states 0,1,2,3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!