Question: 1 Q-Learning Properties 2 Points Grading comment: In general, for Q-Learning to converge to the optimal Q-values... The following checkbox options contain math elements, so

1 Q-Learning Properties 2 Points Grading comment: In general, for Q-Learning to converge to the optimal Q-values... The following checkbox options contain math elements, so you may need to read them in your screen reader's "reading" or "browse" mode instead of "forms" or "focus" mode. Choice 1 of 4: It is necessary that every state-action pair is visited infinitely often. Choice 2 of 4: It is necessary that the learning rate (weight given to new samples) is decreased to 0 0 over time. Choice 3 of 4: It is necessary that the discount is less than 0.5 0.5. Choice 4 of 4: It is necessary that actions get chosen according to arg a Q ( s , a ) argmax a Q(s,a). Save Answer Question 1: Q-Learning Properties Q2 Exploration and Exploitation 5 Points Grading comment: For each of the following action-selection methods, indicate which option describes it best. Question 2.1 Q2.1 1 Point Grading comment: Method A: With probability p p, select a r g m a x a Q ( s , a ) argmax a Q(s,a). With probability 1 p 1p, select a random action. p = 0.99 p=0.99

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!