Question: In the context of our Q-Learning algorithm, select all which are true: we calculate a quality score for each (environment, action) pair we use a
In the context of our Q-Learning algorithm, select all which are true:
we calculate a quality score for each (environment, action) pair
we use a high value for gamma, the discount, to place more emphasis on future feedback; a lower value places more emphasis on immediate feeback
absent some limit or threshold, our Q-Learning algorithm will run forever
Our quality score is the delta (difference) between immediate and future feedback
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
