Question: In the context of our Q-Learning algorithm, select all which are true: 1: we calculate a quality score for each (environment, action) pair 2:we use
In the context of our Q-Learning algorithm, select all which are true:
1: we calculate a quality score for each (environment, action) pair
2:we use a high value for gamma, the discount, to place more emphasis on future feedback; a lower value places more emphasis on immediate feeback
3: absent some limit or threshold, our Q-Learning algorithm will run forever
4:Our quality score is the delta (difference) between immediate and future feedback
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
