Question: After observing the agent for a while, Adam realized that his assumption of T being deterministic is wrong in one specific way: when the agent
After observing the agent for a while, Adam realized that his assumption of
T being deterministic is wrong in one specific way: when the agent tries to legally move down, it occasionally ends up moving left instead except from grid where moving left results in outofbound Adam still guesses that all other movements are still deterministic.
Suppose we have run Adam's suggested updates until convergence, to get
wrong
Q
wrong
sa under the original assumption of the wrong deterministic
T
Suppose
correct
Q
correct
sa denotes the Q values under the new correct
T where the agent sometimes moves left instead of down
Note that you don't explicitly know the exact probabilities associated with this new
T ie you don't know how often the agent moves left instead of down but you know that it qualitatively differs in the way described above.
Question
Q Points
Grading comment:
For which
sa pairs will
wrong
Q
wrong
sa be an overestimate of
correct
Q
correct
sa
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
