Question: Description In this assignment, you will develop an Al agent trained to play a simple Grid World game using Q - Learning following epsi -

Description

In this assignment, you will develop an Al agent trained to play a simple Grid World game using Q

-

Learning following epsi

-

greedy policy.

The environment consists of a grid where the agent can move up

,

down, left, or right, and the goal is to reach a specific position on the grid. Both the

starting position and the goal position will be determined based on your student ID number. which is

(202162020)

Upon training through your developed code, the agent should be able to find the optimal path

(

shortest path, i

.

.,

least number of moves

)

from the start to

the goal.

Specifications

Grid World

The grid will be a GRID

_

SIZE x GRID

_

SIZE square matrix. For example GRID

_

SIZE could be

5, 7,

etc. Your code should be able to work for any finite integer value of GRID

_

SIZE.

States:

Each state represents a position in the Grid World. So

,

total number of states

=

GRID

_

SIZE

*

GRID

_

SIZE The figure below

(

assg

1 - 5

5

grid.jpg

)

shows an example

5

5

Grid World. Each position in the grid can be referred to by the row and column index

(

i and j respectively

) .

Each position in the grid has a corresponding state index S

.

The example figure below shows how to convert position

[

,

]

to S and vice versa.

Actions:

Allowed actions:

0 (

), 1 (

Down

), (2)

Left,

(3)

Right. The agent can move either up

,

down, left or right from each state

(

not allowed to go outside the boundary

)

-

greedy policy during Q

-

learning training

Rewards:

Moving into the goal state gives a reward of

+ 100 .

Any other move gives a reward of

0 .

Moving outside the grid is not allowed.

Parameters

(

these are defined in Section

4)

STUDENTID: Enter your student id GRID

_

SIZE: Set to

5

as default, but your Q

-

Learning code should be able to work for any finite integer value of GRID

_

SIZE EPISODES: Choose an appropriate number such that your agent can find the optimal path and that the Q

-

Table converges Learning rate

(

)

: alpha

= 0.1

Discount factor

(

)

: gamma

= 0.9

Exploration rate

(

)

: epsilon

= 0.2

Description In this assignment, you will develop

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

0. Download Get the Java CTF package: https://mega.nz/#F!d6AEgDRI 1. Introduction This project asks you to develop an agent that acts intelligently in an unfamiliar external world. Though the inner...

part 1 Please list the scenarios you played. 1. students need to play 2 scenarios. For grading purposes, I will select the best two scenarios for graduate students and the best one for undergraduate...

Providing Quality School-Based Learning and Support Services 239 Chapter 6 Language and literacy support Your core task The core task of almost all TAs is to support students language and literacy...

I hope you can answer this question and find the reference below the question. Thank you Topic: Conducting personal job interviews using the STAR model 1- Design a two-hour training work plan for 10...

Topic: Conducting personal job interviews using the star model 1-Design a two-hour training work plan for 10 trainees 2-Determine the quality of trainees 3-Use the training design model Formulate one...

Learning Resources Required Readings Short, N. M. (2022). Milstead's health policy and politics: A nurse's guide (7th ed.). Jones & Bartlett Learning. Chapter 1, "Informing Public Policy: An...

What are the biggest ah-ha! moments from Oracy Development? 6 English-Language Oracy Development Learning Outcomes After reading this chapter, you should be able to ... . Describe the basics of...

You may practice teaching and learning tactics. Create a list you may use in class, others, and as a solo instructor. 2 Language Structure and Use Learning Outcomes After reading this chapter, you...

Read Classroom Glimpse. Discuss stress, rhythm, pitch, and intonation based on the tale in the classroom 2 Language Structure and Use Learning Outcomes After reading this chapter, you should be able...

Discuss Semantics and the challenges they are in English. 2 Language Structure and Use Learning Outcomes After reading this chapter, you should be able to ... Explain how language contributes to...

If, in Example 4, one molecule of the product C is formed from one molecule of the reactant A and one molecule of the reactant B, and the initial concentrations of A and B have a common value [A] =...

Describe the average inflation rate in the United States over the past 40 years. When has there been high inflation? When has there been deflation? Are inflation rates roughly the same in countries...

5. Lady Gaga Co. recently made an investment in the bonds issued by Chili Peppers Inc. Lady Gagas business model for this investment is to profit from selling in response to changes in market...

Decide if the following probability At a local university you poll a gro Answer Decide if the following probability At a local university, you poll a grc Answer