Question: Suppose we are learning Q * * ( s , a ) for Pacman's world. Pacman can take the following actions { N , S
Suppose we are learning for Pacman's world.
Pacman can take the following actions
Currently, Pacman's estimate is such that for all
Suppose Pacmans scheme for exploration is to
take a random action with probability
act according to the current policy with probability
What is the probability of Pacman moving north, ie taking action
Suppose Pacman updates the estimate using a running average with parameter
If Pacman moves south, ie makes the action and receives a reward of what is the new estimate of
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
