Task 1 Complete get next state ( current state pos, action, grid size ) function to return the next state's grid positions ( row , column ) based on the given current state pos and action Complete Tasks 1 1 1 4 to update the row and or column value as needed Task 2 Complete the q learning ( ) function by implementing the Q learning algorithm ( following greedy policy ) It should return the final Q table as q table To help you, partial code has been givenComplete the code for Tasks 2 1 2 4 Note do not change the function header, your solution should be such that the function must not need any additional inputs Note Do not change any variable names, function names or function input output variable names in the pre written code Global Paramaters ( Do not change these parameter names ) STUDENTID and goal positions ) GRID SIZE 5 ACTIONS 4 DO NOT CHANGE EPISODES CHANGE to an appropriate number to ensure agent learns to find the optimal path and that Q table converges Do not change number of episodes parameter variable anywhere else in the code ALPHA 0 1 DO NOT CHANGE EPSILON 0 2 DO NOT CHANGE GAMMA 0 9 DO NOT CHANGE TASK 1 Complete the function to get next state based on given action def get next state ( current state pos, action, grid size 5 ) DO NOT CHANGE THIS LINE row, column current state pos DO NOT CHANGE THIS LINE if action 0 and row 0 Move up Task 1 1 update row and or column as needed YOUR CODE HERE elif action 1 and row grid size 1 Move down Task 1 2 update row and or column as needed YOUR CODE HERE elif action 2 and column 0 Move left Task 1 3 update row and or column as needed elif action 3 and column grid size 1 Move right Task 1 4 update row and or column as needed YOUR CODE HERE return row, column DO NOT CHANGE THIS LINE TASK 2 1 Complete the get action function ( in Task 2 1 ) This function will be called from the q learning ( ) function see below Inputs q table, epsilon, current state index Outputs action based on epsilon greedy decision making policy, should be either 0 , 1 , 2 , or 3 def get action ( q table, epsilon, current state index ) Task 2 1 Choose an action using epsilon greedy policy YOUR CODE HERE return action TASK 2 3 Complete the update q table function ( in Task 2 3 ) This function will be called from the q learning ( ) function Inputs q table, r table, current state index, action, next state index, alpha 0 1 , gamma 0 9 Outputs q table with updated Q values def update q table ( q table, r table, current state index, action, next state index, alpha 0 1 , gamma 0 9 ) Task 2 3 Update the q table using the Q learning equations taught in class YOUR CODE HERE return q table TASKS 2 2 and 2 4 Q learning algorithm ( following epsilon greedy policy ) Inputs q table, r table initialized by calling the initialize q r tables function inside the main function start pos, goal pos given by the get random start goal function based on student id and grid size num episodes taken from the global constant EPISODES ( you need to determine the episodes needed to train the agent to find the optimal path ) grid size To try different grid sizes, change the GRID SIZE global constant alpha, gamma, epsilon DO NOT CHANGE Outputs q table the final q table after training def q learning ( start pos, goal pos, q table q table g , r table r table g , num episodes EPISODES, alpha 0 1 , gamma 0 9 , epsilon 0 2 , grid size 5 ) for episode in range ( num episodes ) Initialize the state index corresponding to the starting position current state index ( start pos 0 ) grid size ( start pos 1 ) current state pos start pos current state pos has current row, column position of the agent done False while not done Task 2 1 COMPLETE THE CODE IN get action ( ) FUNCTION ABOVE action get action ( q table, epsilon, current state index ) Task 2 2 Get next state based on the chosen action YOUR CODE HERE next state pos Complete this line of code, DO NOT CHANGE VARIABLE NAMES next state index Complete this line of code, DO NOT CHANGE VARIABLE NAMES Task 2 3 COMPLETE THE CODE IN update q table ( ) FUNCTION ABOVE q table update q table ( q table, r table, current state index, action, next state index, alpha, gamma )

The Answer is in the image, click to view ...

Question: Task 1 : Complete get _ next _ state ( current _ state _ pos, action, grid _ size ) function to return the next

Task

1

: Complete get

_

_

state

(

current

_

state

_

pos, action, grid

_

size

)

function to return the next state's grid positions

(

row

,

column

)

based on the given current

_

state

_

pos and action.

Complete Tasks

1.1 - 1.4

to update the row and

/

or column value as needed

Task

2

: Complete the q

_

learning

(. . .)

function by implementing the Q

-

learning algorithm

(

following

-

greedy policy

) .

It should return the final Q

-

table as q

_

table.

To help you, partial code has been givenComplete the code for Tasks

2.1 - 2.4

Note: do not change the function header, your solution should be such that the function must not need any additional inputs.

Note: Do not change any variable names, function names or function input

/

output variable names in the pre

-

written code

# Global Paramaters

(

Do not change these parameter names

)

STUDENTID

=

and goal positions

)

GRID

_

SIZE

= 5

ACTIONS

= 4

# DO NOT CHANGE

EPISODES

=

# CHANGE to an appropriate number to ensure agent learns to find the optimal path and that Q table converges

# Do not change number of episodes parameter

/

variable anywhere else in the code

ALPHA

= 0.1

# DO NOT CHANGE

EPSILON

= 0.2

# DO NOT CHANGE

GAMMA

= 0.9

# DO NOT CHANGE

# TASK

1 -

Complete the function to get next state based on given action

def get

_

_

state

(

current

_

state

_

pos, action, grid

_

size

= 5)

: # DO NOT CHANGE THIS LINE

row, column

=

current

_

state

_

pos # DO NOT CHANGE THIS LINE

if action

= = 0

and row

> 0

: # Move up

[

Task

1.1]

update row and

/

or column as needed

# YOUR CODE HERE

elif action

= = 1

and row

<

grid

_

size

- 1

: # Move down

[

Task

1.2]

update row and

/

or column as needed

# YOUR CODE HERE

elif action

= = 2

and column

> 0

: # Move left

[

Task

1.3]

update row and

/

or column as needed

elif action

= = 3

and column

<

grid

_

size

- 1

: # Move right

[

Task

1.4]

update row and

/

or column as needed

# YOUR CODE HERE

return row, column # DO NOT CHANGE THIS LINE

# TASK

2.1

# Complete the get

_

action function

(

in Task

2.1)

# This function will be called from the q

_

learning

(. . .)

function

-

see below

# Inputs:

# q

_

table, epsilon, current

_

state

_

index

# Outputs:

# action: based on epsilon

-

greedy decision making policy, should be either

0, 1, 2,

3

def get

_

action

(

_

table, epsilon, current

_

state

_

index

)

[

Task

2.1]

Choose an action using epsilon

-

greedy policy

# YOUR CODE HERE

return action

# TASK

2.3

# Complete the update

_

_

table function

(

in Task

2.3)

# This function will be called from the q

_

learning

(. . .)

function

# Inputs:

# q

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha

= 0.1,

gamma

= 0.9

# Outputs:

# q

_

table: with updated Q values

def update

_

_

table

(

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha

= 0.1,

gamma

= 0.9)

[

Task

2.3]

Update the q

_

table using the Q learning equations taught in class

# YOUR CODE HERE

return q

_

table

# TASKS

2.2

and

2.4

: Q

-

learning algorithm

(

following epsilon

-

greedy policy

)

# Inputs:

# q

_

table, r

_

table: initialized by calling the initialize

_

_

_

tables function inside the main function

# start

_

pos, goal

_

pos: given by the get

_

random

_

start

_

goal function based on student

_

id and grid

_

size

# num

_

episodes: taken from the global constant EPISODES

(

you need to determine the episodes needed to train the agent to find the optimal path

)

# grid

_

size: To try different grid sizes, change the GRID

_

SIZE global constant

# alpha, gamma, epsilon: DO NOT CHANGE

# Outputs:

# q

_

table: the final q

_

table after training

def q

_

learning

(

start

_

pos, goal

_

pos, q

_

table

=

_

table

_

,

_

table

=

_

table

_

,

num

_

episodes

=

EPISODES, alpha

= 0.1,

gamma

= 0.9,

epsilon

= 0.2,

grid

_

size

= 5)

for episode in range

(

num

_

episodes

)

# Initialize the state index corresponding to the starting position

current

_

state

_

index

= (

start

_

pos

[0]) *

grid

_

size

+ (

start

_

pos

[1])

current

_

state

_

pos

=

start

_

pos # current

_

state

_

pos has current row, column position of the agent

done

=

False

while not done:

[

Task

2.1]

COMPLETE THE CODE IN get

_

action

(. . .)

FUNCTION ABOVE

action

=

get

_

action

(

_

table, epsilon, current

_

state

_

index

)

[

Task

2.2]

Get next state based on the chosen action

# YOUR CODE HERE

_

state

_

pos

=

# Complete this line of code, DO NOT CHANGE VARIABLE NAMES

_

state

_

index

=

# Complete this line of code, DO NOT CHANGE VARIABLE NAMES

[

Task

2.3]

COMPLETE THE CODE IN update

_

_

table

(. . .)

FUNCTION ABOVE

_

table

=

update

_

_

table

(

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha, gamma

)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Task 1 : * * Complete ` get _ next _ state ( current _ state _ pos, action, grid _ size ) ` function to return the next state's grid positions ( ` row , column ` ) based on the given ` current _...

# Global Paramaters ( Do not change these parameter names ) STUDENTID = # Enter your student ID ( You may change this to try different start and goal positions ) GRID _ SIZE = 5 # ACTIONS = 4 # DO...

Task 2 . 2 , 2 . 4 ( 4 Points + 2 Points ) ) Test overall q _ learning function implementation ( 0 / 6 ) Test Failed: get _ next _ state ( ) missing 1 required positional argument: 'grid _ size' ` `...

` ` ` # Global Parameters ( Do not change these parameter names ) STUDENTID = 2 0 2 1 6 2 0 7 0 # Enter your student ID ( You may change this to try different start and goal positions ) GRID _ SIZE =...

C++ Program: Below are the instructions, followed by all necessary code/data. Stack.h: #ifndef NODEB_STACK_H #define NODEB_STACK_H #include "Node.h" #include using namespace std; template typename...

Below are the instructions, followed by all necessary code/data. Stack.h: #ifndef NODEB_STACK_H #define NODEB_STACK_H #include "Node.h" #include using namespace std; template typename Object> class...

I am doing tax return project. attachments are the materials professor provide. He wants me to do a current year engagement file (Similar with prior year engagement file). AC 371 Tax Return Project...

Python pls! Starter code provided at the end. starter code in text: ### ### Author: ? ### Course: ? ### Description: ? ### from graphics import graphics # Some constants to be used throughout the...

In C++ Project requires code to be executable. Base Code is included at the bottom for your convenience. Base Code for your convenience: You will write an nxn tic-tac-toe game/program that utilizes...

What are two major factors contributing to increased inequality in wages? Briefly, why do these factors raise wage inequality? Contrast possible policy responses to increasing inequality in terms of...

Describe the McKinsey framework through the Sergio Marchionne motor of change case study in brief?

True or false when a tax issue is brought to court, the burden of proof is on irs To show that it adjustments are correct

solution plz