Question: Task 1 : * * Complete ` get _ next _ state ( current _ state _ pos, action, grid _ size ) ` function

Task
1
:
*
*
Complete
`
get
_
next
_
state
(
current
_
state
_
pos, action, grid
_
size
)
`
function to return the next state's grid positions
(
`
row
,
column
`
)
based on the given
`
current
_
state
_
pos
`
and
`
action
`
.
*
Complete Tasks
1
.
1
-
1
.
4
to update the
`
row
`
and
/
or
`
column
`
value as needed
2
.
*
*
Task
2
:
*
*
Complete the
`
q
_
learning
(
.
.
.
)
`
function by implementing the Q
-
learning algorithm
(
following
-
greedy policy
)
.
It should return the final Q
-
table as
`
q
_
table
`
.
*
To help you, partial code has been given
*
Complete the code for Tasks
2
.
1
-
2
.
4
*
Note: do not change the function header, your solution should be such that the function must no
STUDENTID
=
# Enter your student ID
(
You may change this to try different start and goal positions
)
GRID
_
SIZE
=
5
ACTIONS
=
4
EPISODES
=
/
variable anywhere else in the code
ALPHA
=
0
.
1
EPSILON
=
0
.
2
GAMMA
=
0
.
9
# TASK
1
-
Complete the function to get next state based on given action
def get
_
next
_
state
(
current
_
state
_
pos, action, grid
_
size
=
5
)
: # DO NOT CHANGE THIS LINE
row, column
=
current
_
state
_
pos
if action
=
=
0
and row
>
0
: # Move up
#
[
Task
1
.
1
]
update row and
/
or column as needed
# YOUR CODE HERE
elif action
=
=
1
and row
<
grid
_
size
-
1
: # Move down
#
[
Task
1
.
2
]
update row and
/
or column as needed
# YOUR CODE HERE
elif action
=
=
2
and column
>
0
: # Move left
#
[
Task
1
.
3
]
update row and
/
or column as needed
elif action
=
=
3
and column
<
grid
_
size
-
1
: # Move right
#
[
Task
1
.
4
]
update row and
/
or column as needed
# YOUR CODE HERE
return row, column # DO NOT CHANGE THIS LINE
# TASK
2
.
1
# Complete the get
_
action function
(
in Task
2
.
1
)
# This function will be called from the q
_
learning
(
.
.
.
)
function
-
see below
# Inputs:
# q
_
table, epsilon, current
_
state
_
index
# Outputs:
# action: based on epsilon
-
greedy decision making policy, should be either
0
,
1
,
2
,
or
3
#
def get
_
action
(
q
_
table, epsilon, current
_
state
_
index
)
:
#
[
Task
2
.
1
]
Choose an action using epsilon
-
greedy policy
return action
# TASK
2
.
3
# Complete the update
_
q
_
table function
(
in Task
2
.
3
)
# This function will be called from the q
_
learning
(
.
.
.
)
function
# Inputs:
# q
_
table, r
_
table, current
_
state
_
index, action, next
_
state
_
index, alpha
=
0
.
1
,
gamma
=
0
.
9
# Outputs:
# q
_
table: with updated Q values
def update
_
q
_
table
(
q
_
table, r
_
table, current
_
state
_
index, action, next
_
state
_
index, alpha
=
0
.
1
,
gamma
=
0
.
9
)
:
#
[
Task
2
.
3
]
Update the q
_
table using the Q learning equations taught in class
# YOUR CODE HERE
return q
_
table
# TASKS
2
.
2
and
2
.
4
: Q
-
learning algorithm
(
following epsilon
-
greedy policy
)
# Inputs:
# q
_
table, r
_
table: initialized by calling the initialize
_
q
_
r
_
tables function inside the main function
# start
_
pos, goal
_
pos: given by the get
_
random
_
start
_
goal function based on student
_
id and grid
_
size
# num
_
episodes: taken from the global constant EPISODES
(
you need to determine the episodes needed to train the agent to find the optimal path
)
# grid
_
size: To try different grid sizes, change the GRID
_
SIZE global constant
# alpha, gamma, epsilon: DO NOT CHANGE
# Outputs:
# q
_
table: the final q
_
table after training
def q
_
learning
(
start
_
pos, goal
_
pos, q
_
table
=
q
_
table
_
g
,
r
_
table
=
r
_
table
_
g
,
num
_
episodes
=
EPISODES, alpha
=
0
.
1
,
gamma
=
0
.
9
,
epsilon
=
0
.
2
,
grid
_
size
=
5
)
:
for episode in range
(
num
_
episodes
)
:
# Initialize the state index corresponding to the starting position
current
_
state
_
index
=
(
start
_
pos
[
0
]
)
*
grid
_
size
+
(
start
_
pos
[
1
]
)
current
_
state
_
pos
=
start
_
pos # current
_
state
_
pos has current row, column position of the agent
done
=
False
while not done:
#
[
Task
2
.
1
]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!