Question: Project due Sep 4 , 2 0 2 4 0 7 : 5 9 EDT In this section, you will implement the Q - learning

Project due Sep 4,202407:59 EDT
In this section, you will implement the Q-learning algorithm, which is a model-free algorithm used to learn an optimal Q-function. In the tabular setting, the algorithm maintains the Q-value for all possible state-action pairs. Starting from a random Q-function, the agent continuously collects experiences and updates its Q-function.
From now on, we will refer to as an action" although it really is an action with an object.
Q-learning Algorithm
The agent plays an action at state , getting a reward and observing the next state .
Update the single Q-value corresponding to each such transition:
Tip: We recommend you implement all functions from this tab and the next one before submitting your code online. Make sure you achieve reasonable performance on the Home World game
Single step update
1 point possible (graded)
Write a function tabular_q_learning that updates the single Q-value, given the transition date .
Reminder: You should implement this function locally first. You can read through the next tab to understand the context in which this function is called
Available Functions: You have access to the NumPy python library as np. You should also use constants ALPHA and GAMMA in your code

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!