Question: A researcher is training a Deep Q - Network ( DQN ) on a deterministic environment. During training, they observe that the agent is repeatedly

A researcher is training a Deep Q-Network (DQN) on a deterministic environment. During training, they observe that the agent is repeatedly taking suboptimal actions even after many episodes. Upon inspecting the training process, they find that the Q-value updates are incorrect because the model is overly confident about its predictions, even when they are wrong.
What could be the likely cause of this issue?
Question 9 options:
The discount factor is set too high, making the agent overly sensitive to future rewards.
The agent is not balancing exploration and exploitation properly, which causes it to overfit to suboptimal actions.
The model is not using the temporal difference (TD) error correctly
The replay memory is not storing enough transitions, leading to poor learning diversity.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!