Question: ( 1 3 ) In this grid world ( 3 0 points ) The agent starts at position S ( top - left corner )

(13) In this grid world (30 points)
The agent starts at position S(top-left corner).
The goal is located at position G(bottom-left corner).
There is one obstacle located at position x(center).
The agent can move up, down, left, or right within the grid, but cannot move into the
obstacle cell.
The objective for the agent is to navigate from the start position S to the goal position G while
avoiding the obstacle x. The agent receives a reward of +10 for reaching the goal and a penalty
of -10 for hitting the obstacle. All other movements incur a small penalty of -1 to encourage the
agent to find the shortest path.
Using Q-learning, calculate the Q-values for each state-action pair after a few iterations (4-5
iterations) Assume a discount factor ( of 0.9 and a learning rate () of 0.1.
( 1 3 ) In this grid world ( 3 0 points ) The

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!