Question: Task 2 . 2 , 2 . 4 ( 4 Points + 2 Points ) ) Test overall q _ learning function implementation ( 0

Task 2.2,2.4(4 Points +2 Points)) Test overall q_learning function implementation (0/6)
Test Failed: get_next_state() missing 1 required positional argument: 'grid_size' ```
# TASKS 2.2 and 2.4: Q-learning algorithm (following epsilon-greedy policy)
# Inputs:
# q_table, r_table: initialized by calling the initialize_q_r_tables function inside the main function
# start_pos, goal_pos: given by the get_random_start_goal function based on student_id and grid_size
# num_episodes: taken from the global constant EPISODES (you need to determine the episodes needed to train the agent
# grid_size: To try different grid sizes, change the GRID_SIZE global constant
# alpha, gamma, epsilon: DO NOT CHANGE
# Outputs:
# q_table: the final q_table after training
def q_learning(start_pos, goal_pos, q_table=q_table_g, r_table=r_table_g, num_episodes=EPISODES, alpha=0.1, gamma=0.9, epsil
for episode in range(num_episodes):
# Initialize the state index corresponding to the starting position
current_state_index = start_pos[0]* grid_size + start_pos[1] # Ensure this results in an integer
current_state_pos = start_pos # current_state_pos has current row, column position of the agent
done = False
while not done:
# [Task 2.1] Get action using epsilon-greedy policy
action = get_action(q_table, epsilon, current_state_index)
# [Task 2.2] Get next state based on the chosen action
next_state_pos = get_next_state(current_state_pos, action, grid_size) # Pass grid_size
next_state_index = next_state_pos[0]* grid_size + next_state_pos[1] # Correct calculation of index
# [Task 2.3] Update the Q-table using Q-learning formula
q_table = update_q_table(q_table, r_table, current_state_index, action, next_state_index, alpha, gamma)
# Update the 'state' to the next state index
current_state_pos = next_state_pos
current_state_index = next_state_index
# [Task 2.4] End episode if goal is reached
if current_state_pos == goal_pos:
done = True # Episode ends when the goal is reached
q_table_g = q_table # DO NOT CHANGE THIS LINE
``` Task 1(8 Points)) Test get_next_state() function implementation (0/8)
Test Failed: unsupported operand type(s) for divmod(): 'tuple' and 'int'
Task 2.1(6 Points)) Test get_action implementation (6/6)
Task 2.2,2.4(4 Points +2 Points)) Test overall q_learning function implementation (0/6)
Test Failed: get_next_state() missing 1 required positional argument: 'grid_size'
Task 2.3(5 Points)) Test update_q_table function (0/5)
Test Failed: unsupported operand type(s) for divmod(): 'tuple' and 'int' ```
# TASK 2.3
# Complete the update_q_table function (in Task 2.3)
# This function will be called from the q_learning(...) function
# Inputs:
# q_table, r_table, current_state_index, action, next_state_index, alpha=0.1, gamma=0.9
# Outputs:
# q_table: with updated Q values
def update_q_table(q_table, r_table, current_state_index, action, next_state_index, alpha=0.1, gamma=0.9):
best_next_action = np.argmax(q_table[next_state_index])
td_target = r_table[current_state_index][action]+ gamma * q_table[next_state_index][best_next_action]
td_error = td_target - q_table[current_state_index][action]
q_table[current_state_index][action]+= alpha * td_error
return q_table
```
Task 2 . 2 , 2 . 4 ( 4 Points + 2 Points ) ) Test

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!