Question: 4 - Assuming that all Q - values are initialized to 0 , what are the Q - values for the following state - action

4- Assuming that all Q-values are initialized to 0, what are the Q-values for the following state-action pairs after running [tabular]
Q-learning for the first episode? [skip/disregard episodes 2 and 3]. Use discount factor =0.8 and learning rate =0.6
Q(A, Down)
Q(B,Up)
Hint: Use the following equations and update Q values after each transition until the end of episode 1.
Consider your new sample estimate
target =R(s,a,s')+maxa'hat(Q)(s',a')
Incorporate the new estimate into a running average
hat(Q)(s,a)larr(1-)hat(Q)(s,a)+()[ target ]
5- Repeat part 4 if you run SARSA (temporal difference) with the above experience sequence (again assume that all Q-values
are initialized to 0 and use only episode 1)? Use discount factor =0.8 and learning rate =0.6
Hint: Use the following equations and update Q values after each transition until the end of episode 1.
Sample of hat(Q)(s,a):, target =R(s,a,s')+hat(Q)(s',a')
Update hat(Q)(s,a):,hat(Q)(s,a)larr(1-)hat(Q)(s,a)+ target
 4- Assuming that all Q-values are initialized to 0, what are

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!