Question: Consider the following grid world, in which an agent can explore the environment until it finds the Goal ( G ) . In this problem,
Consider the following grid world, in which an agent can explore the environment until it finds the Goal G In this problem, you will update the estimates of the Q function based on experiences of the agent. In this environment, all actions in all squares result in a zero reward, except the actions that result in entering the goal square and the actions that result entering the danger square X that result in a punishment, ie negative reward. The rewards rs a of each action a in state s was shown in the below figure. Assume that the initial estimate Qsa is zero for all state and action pairs.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
