Question: Markov Decision Process ( MDP ) and Deep Q Learning 1 . For the MDP example on our RL lecture slides 1 3 1 6
Markov Decision Process MDP and Deep Q Learning For the MDP example on our RL lecture slides recompute the values of states at the second iteration, ie Vs with a new transition function: probability east action will reach east, and rest split equally with other cells, same for other actions. All other settings are the same. In deep Q learning, the training may not be stable. Explain what causes this instability. How to make it more stable?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
