Question: What values do we have for Q ( s 1 , a 1 ) and Q ( s 2 , a 1 ) now, after

What values do we have for Q(s1, a1) and Q(s2, a1) now, after these three steps of updates? Write
112 down how you obtained them.
1132. Suppose from here we will use the \epsi -greedy strategy with \epsi =0.3, which means that with \epsi probability
114 we will use an arbitrary action (each of the two actions will be chosen equally likely in this case), and
115 with 1\epsi probability we will choose the best action according to the current Q-values. Now that we
116 are in s2 after Step 3, what is the probability of seeing the transition (s2, a1, s1) in the next step? That
117 is, calculate the probability of the event according to the \epsi -greedy policy, we obtained the action a1
118 in the current state s2, and after applying this action, the MDP puts us in s1 as the next state.
1193. If instead of \epsi -greedy policy, we take the greedy policy that always takes the action that maximizes
120 Q-values in each step, then what is the probability of seeing (s2, a1, s1) in the next step?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!