Question: Suppose we run value iteration in an MDP with only non-negative rewards (that is, R(s, a, s') 0 for any (s, a, s')). Let

Suppose we run value iteration in an MDP with only non-negative rewards (that is, R(s, a, s') ≥ 0 for any (s, a, s')). Let the values on the kth iteration be Vk(s) and the optimal values be V (s). Initially, the values are 0 (that is, V0(s) = 0 for any s).

a. Mark all of the options that are guaranteed to be true. 

(i) For any s, a, s' , V1(s) = R(s, a, s') 

(ii) For any s, a, s' , V1(s) ≤ R(s, a, s') 

(iii) For any s, a, s' , V1(s) ≥ R(s, a, s') 

(iv) None of the above are guaranteed to be true. 

b. Mark all of the options that are guaranteed to be true. 

(i) For any k, s, Vk(s) = V (s) 

(ii) For any k, s, Vk(s) ≤ V (s) 

(iii) For any k, s, Vk(s) ≥ V (s) 

(iv) None of the above are guaranteed to be true.

Step by Step Solution

3.28 Rating (157 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

iv Only using the Bellman equation and setting V0s 0 Now consider an MDP where ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Artificial Intelligence A Modern approach Questions!