Task 2 2 , 2 4 ( 4 Points 2 Points ) ) Test overall q learning function implementation ( 0 6 ) Test Failed get next state ( ) missing 1 required positional argument 'grid size' TASKS 2 2 and 2 4 Q learning algorithm ( following epsilon greedy policy ) Inputs q table, r table initialized by calling the initialize q r tables function inside the main function start pos, goal pos given by the get random start goal function based on student id and grid size num episodes taken from the global constant EPISODES ( you need to determine the episodes needed to train the agent grid size To try different grid sizes, change the GRID SIZE global constant alpha, gamma, epsilon DO NOT CHANGE Outputs q table the final q table after training def q learning ( start pos, goal pos, q table q table g , r table r table g , num episodes EPISODES, alpha 0 1 , gamma 0 9 , epsil for episode in range ( num episodes ) Initialize the state index corresponding to the starting position current state index start pos 0 grid size start pos 1 Ensure this results in an integer current state pos start pos current state pos has current row, column position of the agent done False while not done Task 2 1 Get action using epsilon greedy policy action get action ( q table, epsilon, current state index ) Task 2 2 Get next state based on the chosen action next state pos get next state ( current state pos, action, grid size ) Pass grid size next state index next state pos 0 grid size next state pos 1 Correct calculation of index Task 2 3 Update the Q table using Q learning formula q table update q table ( q table, r table, current state index, action, next state index, alpha, gamma ) Update the 'state' to the next state index current state pos next state pos current state index next state index Task 2 4 End episode if goal is reached if current state pos goal pos done True Episode ends when the goal is reached q table g q table DO NOT CHANGE THIS LINE Task 1 ( 8 Points ) ) Test get next state ( ) function implementation ( 0 8 ) Test Failed unsupported operand type ( s ) for divmod ( ) 'tuple' and 'int' Task 2 1 ( 6 Points ) ) Test get action implementation ( 6 6 ) Task 2 2 , 2 4 ( 4 Points 2 Points ) ) Test overall q learning function implementation ( 0 6 ) Test Failed get next state ( ) missing 1 required positional argument 'grid size' Task 2 3 ( 5 Points ) ) Test update q table function ( 0 5 ) Test Failed unsupported operand type ( s ) for divmod ( ) 'tuple' and 'int' TASK 2 3 Complete the update q table function ( in Task 2 3 ) This function will be called from the q learning ( ) function Inputs q table, r table, current state index, action, next state index, alpha 0 1 , gamma 0 9 Outputs q table with updated Q values def update q table ( q table, r table, current state index, action, next state index, alpha 0 1 , gamma 0 9 ) best next action np argmax ( q table next state index ) td target r table current state index action gamma q table next state index best next action td error td target q table current state index action q table current state index action alpha td error return q table Show all images Show all images Show all images done loading

The Answer is in the image, click to view ...

Question: Task 2 . 2 , 2 . 4 ( 4 Points + 2 Points ) ) Test overall q _ learning function implementation ( 0

Task

2.2, 2.4 (4

Points

+ 2

Points

))

Test overall q

_

learning function implementation

(0 / 6)

Test Failed: get

_

_

state

()

missing

1

required positional argument: 'grid

_

size'

` ` `

# TASKS

2.2

and

2.4

: Q

-

learning algorithm

(

following epsilon

-

greedy policy

)

# Inputs:

# q

_

table, r

_

table: initialized by calling the initialize

_

_

_

tables function inside the main function

# start

_

pos, goal

_

pos: given by the get

_

random

_

start

_

goal function based on student

_

id and grid

_

size

# num

_

episodes: taken from the global constant EPISODES

(

you need to determine the episodes needed to train the agent

# grid

_

size: To try different grid sizes, change the GRID

_

SIZE global constant

# alpha, gamma, epsilon: DO NOT CHANGE

# Outputs:

# q

_

table: the final q

_

table after training

def q

_

learning

(

start

_

pos, goal

_

pos, q

_

table

=

_

table

_

,

_

table

=

_

table

_

,

num

_

episodes

=

EPISODES, alpha

= 0.1,

gamma

= 0.9,

epsil

for episode in range

(

num

_

episodes

)

# Initialize the state index corresponding to the starting position

current

_

state

_

index

=

start

_

pos

[0] *

grid

_

size

+

start

_

pos

[1]

# Ensure this results in an integer

current

_

state

_

pos

=

start

_

pos # current

_

state

_

pos has current row, column position of the agent

done

=

False

while not done:

[

Task

2.1]

Get action using epsilon

-

greedy policy

action

=

get

_

action

(

_

table, epsilon, current

_

state

_

index

)

[

Task

2.2]

Get next state based on the chosen action

_

state

_

pos

=

get

_

_

state

(

current

_

state

_

pos, action, grid

_

size

)

# Pass grid

_

size

_

state

_

index

=

_

state

_

pos

[0] *

grid

_

size

+

_

state

_

pos

[1]

# Correct calculation of index

[

Task

2.3]

Update the Q

-

table using Q

-

learning formula

_

table

=

update

_

_

table

(

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha, gamma

)

# Update the 'state' to the next state index

current

_

state

_

pos

=

_

state

_

pos

current

_

state

_

index

=

_

state

_

index

[

Task

2.4]

End episode if goal is reached

if current

_

state

_

pos

= =

goal

_

pos:

done

=

True # Episode ends when the goal is reached

_

table

_

=

_

table # DO NOT CHANGE THIS LINE

` ` `

Task

1 (8

Points

))

Test get

_

_

state

()

function implementation

(0 / 8)

Test Failed: unsupported operand type

(

)

for divmod

()

: 'tuple' and 'int'

Task

2.1 (6

Points

))

Test get

_

action implementation

(6 / 6)

Task

2.2, 2.4 (4

Points

+ 2

Points

))

Test overall q

_

learning function implementation

(0 / 6)

Test Failed: get

_

_

state

()

missing

1

required positional argument: 'grid

_

size'

Task

2.3 (5

Points

))

Test update

_

_

table function

(0 / 5)

Test Failed: unsupported operand type

(

)

for divmod

()

: 'tuple' and 'int'

` ` `

# TASK

2.3

# Complete the update

_

_

table function

(

in Task

2.3)

# This function will be called from the q

_

learning

(. . .)

function

# Inputs:

# q

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha

= 0.1,

gamma

= 0.9

# Outputs:

# q

_

table: with updated Q values

def update

_

_

table

(

_

table, r

_

table, current

_

state

_

index, action, next

_

state

_

index, alpha

= 0.1,

gamma

= 0.9)

best

_

_

action

=

.

argmax

(

_

table

[

_

state

_

index

])

_

target

=

_

table

[

current

_

state

_

index

] [

action

] +

gamma

*

_

table

[

_

state

_

index

] [

best

_

_

action

]

_

error

=

_

target

-

_

table

[

current

_

state

_

index

] [

action

]

_

table

[

current

_

state

_

index

] [

action

] + =

alpha

*

_

error

return q

_

table

` ` `

Task 2 . 2 , 2 . 4 ( 4 Points + 2 Points ) ) Test

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Lab2.java includes the following code public final class Lab2 { /** * This is empty by design, Lab2 cannot be instantiated */ private Lab2() { // empty by design } /** * Returns the sum of a...

Objectives: - Designing a Single-Cycle Processor with 16-bit instructions and 16-bit registers - Using the Logisim simulator to model and test the processor - Teamwork Instruction Set Architecture In...

help me for this assignment please! thanks very much! Assessment Information - Trimester 2, 2016 Subject Code: ACC 202 Subject Name: Accounting Information Systems Assessment Title: Individual: Excel...

............... FloatArrays.java : ckage csi213.lab05; import java.util.Arrays; /** * This {@code FloatArrays} class provides methods for manipulating {@code Float} arrays. */ public class...

In python 3, with screenshots Problem: The Reversi Game Reversi is a 2-player game, played on an 8 x 8 board. Players take turns placing their disks on the board with their assigned colour (Black and...

2018 1,170 11 5 Yo 5 Language Arts Dept. 2016 690 $ 2 450 $ 1,050 2017 3,200 2 000 23.7 % 2018 2 500 18 25 % Learning Source Inc. 2016 $ 1 360 $ 10 470 $ 2,970 2017 2018 % Requirement 2. Use the...

Implement in Python. No additional parameters can be added to the methods but additional functions can be created and called in the given methods. Problem: The Reversi Game Reversi is a 2-player...

In python! Additional functions can be called inside the given methods but no other parameters can be added to the given methods. Problem: The Reversi Game Reversi is a 2-player game, played on an 8...

EECS2030: LAB 2 Due: Jan 30th, 2022- 11:59 pm The purpose of this lab is to ensure that you practice A) designing recursive algorithms. B) testing your code thoroughly using JUnit test cases. C)...

Task 1: Frequency Oracle (10 Points): We want each user to report a value that has a domain of d=100 values, in a way that satisfy -local differential privacy for =ln4. - 2 Points. When using...

Buebo is a life coaching and fitness coaching firm. They are considering an investment in a customer relationship management platform. The project involves $225,000 in first costs and is expected to...

Jack took the annual dividend of Team Shirts, which was 0,50 per share, and divided it by the market price of 17 per share. This came to 2,9%, which is the ____. Select one: a. Dividend yield b....

When uning accrual bask accounting, why is it necessary to produce a carh flos statement as well as an income slatement and balance sheet? Show Other Breples 1 1 V

The scalar equation of the plane passes through (-1, 3, -2), (-1, 2, -1) and (4, 1, -2) is:2x - 5y + 5z + 3 = 02x - 5y5z + 3 = 0none of the options listed2x + 5y +5z -3 = 0