Question: # Global Paramaters ( Do not change these parameter names ) STUDENTID = # Enter your student ID ( You may change this to try

# Global Paramaters

(

Do not change these parameter names

)

STUDENTID

=

# Enter your student ID

(

You may change this to try different start and goal positions

)

GRID

_

SIZE

= 5

# ACTIONS

= 4

# DO NOT CHANGE EPISODES

=

# CHANGE to an appropriate number to ensure agent learns to find the optimal path and that Q table converges # Do not change number of episodes parameter

/

variable anywhere else in the code ALPHA

= 0.1

# DO NOT CHANGE EPSILON

= 0.2

# DO NOT CHANGE GAMMA

= 0.9

# DO NOT CHANGE # TASK

1 -

Complete the function to get next state based on given action def get

_

_

state

(

current

_

state

_

pos, action, grid

_

size

= 5)

: # DO NOT CHANGE THIS LINE row, column

=

current

_

state

_

pos # DO NOT CHANGE THIS LINE if action

= = 0

and row

> 0

: # Move up #

[

Task

1.1]

update row and

/

or column as needed # YOUR CODE HERE elif action

= = 1

and row

grid

_

size

- 1

: # Move down #

[

Task

1.2]

update row and

/

or column as needed # YOUR CODE HERE elif action

= = 2

and column

> 0

: # Move left #

[

Task

1.3]

update row and

/

or column as needed elif action

= = 3

and column

grid

_

size

- 1

: # Move right #

[

Task

1.4]

update row and

/

or column as needed # YOUR CODE HERE return row, column # DO NOT CHANGE THIS LINE # TASK

2.1

# Complete the get

_

action function

(

in Task

2.1)

# This function will be called from the q

_

learning

(. . .)

function

-

see below # Inputs: # q

_

table, epsilon, current

_

state

_

index # Outputs: # action: based on epsilon

-

greedy decision making policy, should be either

0, 1, 2,

3

# def get

_

action

(

_

table, epsilon, current

_

state

_

index

)

: #

[

Task

2.1]

Choose an action using epsilon

-

greedy policy # YOUR CODE HERE return action # TASK

2.3

# Complete the update

_

_

table function

(

in Task

2.3)

# This function will be called from the q

_

learning

(. . .)

function # Inputs: # q

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha

= 0.1,

gamma

= 0.9

# Outputs: # q

_

table: with updated Q values def update

_

_

table

(

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha

= 0.1,

gamma

= 0.9)

: #

[

Task

2.3]

Update the q

_

table using the Q learning equations taught in class # YOUR CODE HERE return q

_

table # TASKS

2.2

and

2.4

: Q

-

learning algorithm

(

following epsilon

-

greedy policy

)

# Inputs: # q

_

table, r

_

table: initialized by calling the initialize

_

_

_

tables function inside the main function # start

_

pos, goal

_

pos: given by the get

_

random

_

start

_

goal function based on student

_

id and grid

_

size # num

_

episodes: taken from the global constant EPISODES

(

you need to determine the episodes needed to train the agent to find the optimal path

)

# grid

_

size: To try different grid sizes, change the GRID

_

SIZE global constant # alpha, gamma, epsilon: DO NOT CHANGE # Outputs: # q

_

table: the final q

_

table after training def q

_

learning

(

start

_

pos, goal

_

pos, q

_

table

=

_

table

_

,

_

table

=

_

table

_

,

num

_

episodes

=

EPISODES, alpha

= 0.1,

gamma

= 0.9,

epsilon

= 0.2,

grid

_

size

= 5)

: for episode in range

(

num

_

episodes

)

: # Initialize the state index corresponding to the starting position current

_

state

_

index

= (

start

_

pos

[0]) *

grid

_

size

+ (

start

_

pos

[1])

current

_

state

_

pos

=

start

_

pos # current

_

state

_

pos has current row, column position of the agent done

=

False while not done: #

[

Task

2.1]

COMPLETE THE CODE IN get

_

action

(. . .)

FUNCTION ABOVE action

=

get

_

action

(

_

table, epsilon, current

_

state

_

index

)

[

Task

2.2]

Get next state based on the chosen action # YOUR CODE HERE next

_

state

_

pos

=

# Complete this line of code, DO NOT CHANGE VARIABLE NAMES next

_

state

_

index

=

# Complete this line of code, DO NOT CHANGE VARIABLE NAMES #

[

Task

2.3]

COMPLETE THE CODE IN update

_

_

table

(. . .)

FUNCTION ABOVE q

_

table

=

update

_

_

table

(

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha, gamma

)

# Update the 'state' to the next state index current

_

state

_

pos

=

_

state

_

pos current

_

s # Do not change number of episodes parameter

/

variable anywhere else in the code ALPHA

= 0.1

# DO NOT CHANGE EPSILON

= 0.2

# DO NOT CHANGE GAMMA

= 0.9

# DO NOT CHANGE

helper methods are attached

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

In Java please. 3 CLASS STUDENT An object of this class will represent a single student. 3.1 fields int id : A unique id number . String name : A non unique name. It can be first last or just first...

Your program will need to have two arrays. One will be of type int, called studentId in this document. The other is of type double, called grades in this document. You may use different names for...

Hi, please help me with the following problem and please if you are not getting the question don't answer it. This question has a pre coded file that i am attaching the screenshots of that with this...

I need help with part F for drawing a UML class diagram for the Student class. I don't know how to create that diagram. Make sure that you list all of the fields and methods in the diagram. Also,...

i want complete solution for my assignment and it should be without plagiarism COIT20274: Information Systems for Business Professionals, Term One 2016 Assignments 1 & 2 Requirements Assignment 1 -...

I only need for the economics and financial performance part to be done as per the table in the assignment question attachment. 600-800 words. I have completed the rest of the assignment already. The...

ACCT2060- Accounting for Organisations and Society Marking Rubric for Individual assignment Semester 1 2016 High distinction 10 Distinction 7.5 Credit 6.5 Pass 5 Below standards 2.5 Title page, and...

The Industrial\\Organizational Program Handbook Florida Tech 2013-2014 Rev: 7/31/13 Welcome to the Industrial/Organizational Psychology program at Florida Tech! We are glad you have chosen Florida...

A firm with a 14 percent WACC is evaluating two projects for this year's capital budget. After-tax cash flows, including depreciation, are as follows: a. Calculate NPV, IRR, MIRR, payback, and...

Let X be a Poisson random variable with parameter . A sample of 150 observations from this population has a mean equal to 2.5. Construct a 98% confidence interval for .

Which of the folowing roues is a multinational company ( VavO thely to choose in order to quidly expand resources or construct high profit products in a new market? franchiving basic export and...

A triangle has vertices A(3, 4), B(-2, 0), and C(5, 0). Prove that the area of the triangle formed by joining the midpoints of triangle ABC is one quarter the area of triangle ABC.