Question: A researcher is training a Deep Q - Network ( DQN ) on a deterministic environment. During training, they observe that the agent is repeatedly
A researcher is training a Deep QNetwork DQN on a deterministic environment. During training, they observe that the agent is repeatedly taking suboptimal actions even after many episodes. Upon inspecting the training process, they find that the Qvalue updates are incorrect because the model is overly confident about its predictions, even when they are wrong.
What could be the likely cause of this issue?
Question options:
The discount factor is set too high, making the agent overly sensitive to future rewards.
The agent is not balancing exploration and exploitation properly, which causes it to overfit to suboptimal actions.
The model is not using the temporal difference TD error correctly
The replay memory is not storing enough transitions, leading to poor learning diversity.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
