Question: Project due Sep 4 , 2 0 2 4 0 7 : 5 9 EDT In this section, you will implement the Q - learning
Project due Sep : EDT
In this section, you will implement the Qlearning algorithm, which is a modelfree algorithm used to learn an optimal Qfunction. In the tabular setting, the algorithm maintains the Qvalue for all possible stateaction pairs. Starting from a random Qfunction, the agent continuously collects experiences and updates its Qfunction.
From now on we will refer to as an action" although it really is an action with an object.
Qlearning Algorithm
The agent plays an action at state getting a reward and observing the next state
Update the single Qvalue corresponding to each such transition:
Tip: We recommend you implement all functions from this tab and the next one before submitting your code online. Make sure you achieve reasonable performance on the Home World game
Single step update
point possible graded
Write a function tabularqlearning that updates the single Qvalue, given the transition date
Reminder: You should implement this function locally first. You can read through the next tab to understand the context in which this function is called
Available Functions: You have access to the NumPy python library as np You should also use constants ALPHA and GAMMA in your code
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
