Question: # Global Paramaters ( Do not change these parameter names ) STUDENTID = # Enter your student ID ( You may change this to try

# Global Paramaters (Do not change these parameter names) STUDENTID = # Enter your student ID (You may change this to try different start and goal positions) GRID_SIZE =5 # ACTIONS =4 # DO NOT CHANGE EPISODES = # CHANGE to an appropriate number to ensure agent learns to find the optimal path and that Q table converges # Do not change number of episodes parameter/variable anywhere else in the code ALPHA =0.1 # DO NOT CHANGE EPSILON =0.2 # DO NOT CHANGE GAMMA =0.9 # DO NOT CHANGE # TASK 1- Complete the function to get next state based on given action def get_next_state(current_state_pos, action, grid_size=5): # DO NOT CHANGE THIS LINE row, column = current_state_pos # DO NOT CHANGE THIS LINE if action ==0 and row >0: # Move up # [Task 1.1] update row and/or column as needed # YOUR CODE HERE elif action ==1 and row grid_size -1: # Move down # [Task 1.2] update row and/or column as needed # YOUR CODE HERE elif action ==2 and column >0: # Move left # [Task 1.3] update row and/or column as needed elif action ==3 and column grid_size -1: # Move right # [Task 1.4] update row and/or column as needed # YOUR CODE HERE return row, column # DO NOT CHANGE THIS LINE # TASK 2.1 # Complete the get_action function (in Task 2.1) # This function will be called from the q_learning(...) function - see below # Inputs: # q_table, epsilon, current_state_index # Outputs: # action: based on epsilon-greedy decision making policy, should be either 0,1,2, or 3 # def get_action(q_table, epsilon, current_state_index): # [Task 2.1] Choose an action using epsilon-greedy policy # YOUR CODE HERE return action # TASK 2.3 # Complete the update_q_table function (in Task 2.3) # This function will be called from the q_learning(...) function # Inputs: # q_table, r_table, current_state_index, action, next_state_index, alpha=0.1, gamma=0.9 # Outputs: # q_table: with updated Q values def update_q_table(q_table, r_table, current_state_index, action, next_state_index, alpha=0.1, gamma=0.9): # [Task 2.3] Update the q_table using the Q learning equations taught in class # YOUR CODE HERE return q_table # TASKS 2.2 and 2.4: Q-learning algorithm (following epsilon-greedy policy) # Inputs: # q_table, r_table: initialized by calling the initialize_q_r_tables function inside the main function # start_pos, goal_pos: given by the get_random_start_goal function based on student_id and grid_size # num_episodes: taken from the global constant EPISODES (you need to determine the episodes needed to train the agent to find the optimal path) # grid_size: To try different grid sizes, change the GRID_SIZE global constant # alpha, gamma, epsilon: DO NOT CHANGE # Outputs: # q_table: the final q_table after training def q_learning(start_pos, goal_pos, q_table=q_table_g, r_table=r_table_g, num_episodes=EPISODES, alpha=0.1, gamma=0.9, epsilon=0.2, grid_size=5): for episode in range(num_episodes): # Initialize the state index corresponding to the starting position current_state_index =(start_pos[0])* grid_size +(start_pos[1]) current_state_pos = start_pos # current_state_pos has current row, column position of the agent done = False while not done: # [Task 2.1] COMPLETE THE CODE IN get_action(...) FUNCTION ABOVE action = get_action(q_table, epsilon, current_state_index) # [Task 2.2] Get next state based on the chosen action # YOUR CODE HERE next_state_pos = # Complete this line of code, DO NOT CHANGE VARIABLE NAMES next_state_index = # Complete this line of code, DO NOT CHANGE VARIABLE NAMES # [Task 2.3] COMPLETE THE CODE IN update_q_table(...) FUNCTION ABOVE q_table = update_q_table(q_table, r_table, current_state_index, action, next_state_index, alpha, gamma) # Update the 'state' to the next state index current_state_pos = next_state_pos current_s # Do not change number of episodes parameter/variable anywhere else in the code ALPHA =0.1 # DO NOT CHANGE EPSILON =0.2 # DO NOT CHANGE GAMMA =0.9 # DO NOT CHANGE
helper methods are attached
# Global Paramaters ( Do not change these

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!