Question: Description In this assignment, you will develop an Al agent trained to play a simple Grid World game using Q - Learning following epsi -

Description
In this assignment, you will develop an Al agent trained to play a simple Grid World game using Q-Learning following epsi-greedy policy.
The environment consists of a grid where the agent can move up, down, left, or right, and the goal is to reach a specific position on the grid. Both the
starting position and the goal position will be determined based on your student ID number. which is (202162020)
Upon training through your developed code, the agent should be able to find the optimal path (shortest path, i.e., least number of moves) from the start to
the goal.
Specifications
Grid World
The grid will be a GRID_SIZE x GRID_SIZE square matrix. For example GRID_SIZE could be 5,7, etc. Your code should be able to work for any finite integer value of GRID_SIZE.
States:
Each state represents a position in the Grid World. So, total number of states = GRID_SIZE * GRID_SIZE The figure below (assg1-5x5grid.jpg) shows an example 5x5 Grid World. Each position in the grid can be referred to by the row and column index (i and j respectively). Each position in the grid has a corresponding state index S. The example figure below shows how to convert position [i,j] to S and vice versa.
Actions:
Allowed actions: 0(Up),1(Down),(2) Left, (3) Right. The agent can move either up, down, left or right from each state (not allowed to go outside the boundary) Follow -greedy policy during Q-learning training
Rewards:
Moving into the goal state gives a reward of +100. Any other move gives a reward of 0. Moving outside the grid is not allowed.
Parameters (these are defined in Section 4)
STUDENTID: Enter your student id GRID_SIZE: Set to 5 as default, but your Q-Learning code should be able to work for any finite integer value of GRID_SIZE EPISODES: Choose an appropriate number such that your agent can find the optimal path and that the Q-Table converges Learning rate (): alpha=0.1 Discount factor (): gamma=0.9 Exploration rate (): epsilon=0.2
Description In this assignment, you will develop

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!