Question: don't copy the other Chegg answer All is wrong Consider an unknown Markov Decision Process (MDP) with 3 states (A, B, C) and 2 actions

 don't copy the other Chegg answer All is wrong Consider an

don't copy the other Chegg answer

All is wrong

Consider an unknown Markov Decision Process (MDP) with 3 states (A, B, C) and 2 actions (turnLeft, turnRight), and the agent make decisions according to some policy . Given a dataset consisting of samples (s,a,s,r), which representing taking an action a in state s resulting in a transition to state s and a reward r.(hints: here we consider a dynamic system p(s,rs,a), which means the reward in each step is also . stochastic.) You may consider a discount factor of =1. The update function of Q-learning is: Q(st,at)=(1)Q(st,at)+(rt+maxaQ(st+1,a)) Assume all Q-values are initialized to 0 and use a learning rate of =21. 1. Run Q-learning with data in the table and compute the value of Q (A,turnRight) and Q(B, turnRight). (hints: you may consider to compute Q1(, turnRight), Q1(C, turnLeft ),Q1(B, turnRight), Q2(A, turnRight) with the update function in Eq.(1)) Solution This question was borrowed from UC Berkeley's CS188. 1 Q1(A,)=21Q0(A,)+21(2+maxaQ(B,a))=1Q1(C,)=1Q1(B,)=21(2+1)=21Q2(A,)=211+21(4+maxaQ1(B,a))=21+21(4+0)=25. Ans the question step by step Detail Explain the Ans because I have 0 experience in Machine learning

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!