Question: 6. The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty. The algorithm can be started with each state

6. The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty. The algorithm can be started with each state having a transition to a “nirvana” state, which has very high Q-value (but which will never be reached in practice, and so the probability will shrink to zero).

(a) Does this perform differently than initialing all Q-values to a high value? Does it work better, worse or the same?

(b) How high does the Q-value for the nirvana state need to be to work most effectively?

Suggest a reason why one value might be good, and test it.

(c) Could this method be used for the other RL algorithms? Explain how or why not.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Management And Artificial Intelligence Questions!

Q:

a