Question: Task 1 : * * Complete ` get _ next _ state ( current _ state _ pos, action, grid _ size ) ` function
Task
:
Complete
get
next
state
current
state
pos, action, grid
size
function to return the next state's grid positions
row
column
based on the given
current
state
pos
and
action
Complete Tasks
to update the
row
and
or
column
value as needed
Task
:
Complete the
q
learning
function by implementing the Q
learning algorithm
following
greedy policy
It should return the final Q
table as
q
table
To help you, partial code has been given
Complete the code for Tasks
Note: do not change the function header, your solution should be such that the function must no
STUDENTID
# Enter your student ID
You may change this to try different start and goal positions
GRID
SIZE
ACTIONS
EPISODES
variable anywhere else in the code
ALPHA
EPSILON
GAMMA
# TASK
Complete the function to get next state based on given action
def get
next
state
current
state
pos, action, grid
size
: # DO NOT CHANGE THIS LINE
row, column
current
state
pos
if action
and row
: # Move up
#
Task
update row and
or column as needed
# YOUR CODE HERE
elif action
and row
grid
size
: # Move down
#
Task
update row and
or column as needed
# YOUR CODE HERE
elif action
and column
: # Move left
#
Task
update row and
or column as needed
elif action
and column
grid
size
: # Move right
#
Task
update row and
or column as needed
# YOUR CODE HERE
return row, column # DO NOT CHANGE THIS LINE
# TASK
# Complete the get
action function
in Task
# This function will be called from the q
learning
function
see below
# Inputs:
# q
table, epsilon, current
state
index
# Outputs:
# action: based on epsilon
greedy decision making policy, should be either
or
#
def get
action
q
table, epsilon, current
state
index
:
#
Task
Choose an action using epsilon
greedy policy
return action
# TASK
# Complete the update
q
table function
in Task
# This function will be called from the q
learning
function
# Inputs:
# q
table, r
table, current
state
index, action, next
state
index, alpha
gamma
# Outputs:
# q
table: with updated Q values
def update
q
table
q
table, r
table, current
state
index, action, next
state
index, alpha
gamma
:
#
Task
Update the q
table using the Q learning equations taught in class
# YOUR CODE HERE
return q
table
# TASKS
and
: Q
learning algorithm
following epsilon
greedy policy
# Inputs:
# q
table, r
table: initialized by calling the initialize
q
r
tables function inside the main function
# start
pos, goal
pos: given by the get
random
start
goal function based on student
id and grid
size
# num
episodes: taken from the global constant EPISODES
you need to determine the episodes needed to train the agent to find the optimal path
# grid
size: To try different grid sizes, change the GRID
SIZE global constant
# alpha, gamma, epsilon: DO NOT CHANGE
# Outputs:
# q
table: the final q
table after training
def q
learning
start
pos, goal
pos, q
table
q
table
g
r
table
r
table
g
num
episodes
EPISODES, alpha
gamma
epsilon
grid
size
:
for episode in range
num
episodes
:
# Initialize the state index corresponding to the starting position
current
state
index
start
pos
grid
size
start
pos
current
state
pos
start
pos # current
state
pos has current row, column position of the agent
done
False
while not done:
#
Task
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
