Question: Task 1 : Complete get _ next _ state ( current _ state _ pos, action, grid _ size ) function to return the next

Task 1: Complete get_next_state(current_state_pos, action, grid_size) function to return the next state's grid positions (row, column) based on the given current_state_pos and action.
Complete Tasks 1.1-1.4 to update the row and/or column value as needed
Task 2: Complete the q_learning(...) function by implementing the Q-learning algorithm (following -greedy policy). It should return the final Q-table as q_table.
To help you, partial code has been givenComplete the code for Tasks 2.1-2.4Note: do not change the function header, your solution should be such that the function must not need any additional inputs.
Note: Do not change any variable names, function names or function input/output variable names in the pre-written code
# Global Paramaters (Do not change these parameter names)
STUDENTID = and goal positions)
GRID_SIZE =5 #
ACTIONS =4 # DO NOT CHANGE
EPISODES = # CHANGE to an appropriate number to ensure agent learns to find the optimal path and that Q table converges
# Do not change number of episodes parameter/variable anywhere else in the code
ALPHA =0.1 # DO NOT CHANGE
EPSILON =0.2 # DO NOT CHANGE
GAMMA =0.9 # DO NOT CHANGE
# TASK 1- Complete the function to get next state based on given action
def get_next_state(current_state_pos, action, grid_size=5): # DO NOT CHANGE THIS LINE
row, column = current_state_pos # DO NOT CHANGE THIS LINE
if action ==0 and row >0: # Move up
# [Task 1.1] update row and/or column as needed
# YOUR CODE HERE
elif action ==1 and row < grid_size -1: # Move down
# [Task 1.2] update row and/or column as needed
# YOUR CODE HERE
elif action ==2 and column >0: # Move left
# [Task 1.3] update row and/or column as needed
elif action ==3 and column < grid_size -1: # Move right
# [Task 1.4] update row and/or column as needed
# YOUR CODE HERE
return row, column # DO NOT CHANGE THIS LINE
# TASK 2.1
# Complete the get_action function (in Task 2.1)
# This function will be called from the q_learning(...) function - see below
# Inputs:
# q_table, epsilon, current_state_index
# Outputs:
# action: based on epsilon-greedy decision making policy, should be either 0,1,2, or 3
#
def get_action(q_table, epsilon, current_state_index):
# [Task 2.1] Choose an action using epsilon-greedy policy
# YOUR CODE HERE
return action
# TASK 2.3
# Complete the update_q_table function (in Task 2.3)
# This function will be called from the q_learning(...) function
# Inputs:
# q_table, r_table, current_state_index, action, next_state_index, alpha=0.1, gamma=0.9
# Outputs:
# q_table: with updated Q values
def update_q_table(q_table, r_table, current_state_index, action, next_state_index, alpha=0.1, gamma=0.9):
# [Task 2.3] Update the q_table using the Q learning equations taught in class
# YOUR CODE HERE
return q_table
# TASKS 2.2 and 2.4: Q-learning algorithm (following epsilon-greedy policy)
# Inputs:
# q_table, r_table: initialized by calling the initialize_q_r_tables function inside the main function
# start_pos, goal_pos: given by the get_random_start_goal function based on student_id and grid_size
# num_episodes: taken from the global constant EPISODES (you need to determine the episodes needed to train the agent to find the optimal path)
# grid_size: To try different grid sizes, change the GRID_SIZE global constant
# alpha, gamma, epsilon: DO NOT CHANGE
# Outputs:
# q_table: the final q_table after training
def q_learning(start_pos, goal_pos, q_table=q_table_g, r_table=r_table_g, num_episodes=EPISODES, alpha=0.1, gamma=0.9, epsilon=0.2, grid_size=5):
for episode in range(num_episodes):
# Initialize the state index corresponding to the starting position
current_state_index =(start_pos[0])* grid_size +(start_pos[1])
current_state_pos = start_pos # current_state_pos has current row, column position of the agent
done = False
while not done:
# [Task 2.1] COMPLETE THE CODE IN get_action(...) FUNCTION ABOVE
action = get_action(q_table, epsilon, current_state_index)
# [Task 2.2] Get next state based on the chosen action
# YOUR CODE HERE
next_state_pos = # Complete this line of code, DO NOT CHANGE VARIABLE NAMES
next_state_index = # Complete this line of code, DO NOT CHANGE VARIABLE NAMES
# [Task 2.3] COMPLETE THE CODE IN update_q_table(...) FUNCTION ABOVE
q_table = update_q_table(q_table, r_table, current_state_index, action, next_state_index, alpha, gamma)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!