Question: Markov Decision Process ( MDP ) and Deep Q Learning 1 . For the MDP example on our RL lecture slides 1 3 1 6

Markov Decision Process (MDP) and Deep Q Learning 1. For the MDP example on our RL lecture slides 1316, recompute the values of states at the second iteration, i.e., V2(s) with a new transition function: probability 60% east action will reach east, and rest 40% split equally with other cells, same for other actions. All other settings are the same. 2. In deep Q learning, the training may not be stable. Explain what causes this instability. How to make it more stable?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!