Question: Suppose we are learning Q * * ( s , a ) for Pacman's world. Pacman can take the following actions { N , S

Suppose we are learning Q**(s,a) for Pacman's world.
Pacman can take the following actions
{N,S,E,W}
Currently, Pacman's estimate is Q(s,a) such that for all s
Q(s,N)=10,Q(s,S)=-10,Q(s,E)=5,Q(s,W)=2
Suppose Pacmans scheme for exploration is to
take a random action with probability lon=0.2
act according to the current policy (s)=argmaxaQ(s,a), with probability 1-lon=0.8
What is the probability of Pacman moving north, i.e., taking action N?
Suppose Pacman updates the Q(s,a) estimate using a running average with parameter =0.1.
If Pacman moves south, i.e., makes the action S and receives a reward of 100 what is the new estimate of Q(s,a)?
Q(s,N)=
Q(s,S)=
Q(s,E)=
Q(s,W)=
 Suppose we are learning Q**(s,a) for Pacman's world. Pacman can take

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!