Question: ( 1 3 ) In this grid world ( 3 0 points ) The agent starts at position S ( top - left corner )
In this grid world points
The agent starts at position topleft corner
The goal is located at position bottomleft corner
There is one obstacle located at position center
The agent can move up down, left, or right within the grid, but cannot move into the
obstacle cell.
The objective for the agent is to navigate from the start position to the goal position while
avoiding the obstacle The agent receives a reward of for reaching the goal and a penalty
of for hitting the obstacle. All other movements incur a small penalty of to encourage the
agent to find the shortest path.
Using Qlearning, calculate the Qvalues for each stateaction pair after a few iterations
iterations Assume a discount factor of and a learning rate of
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
