Suppose we run value iteration in an MDP with only non-negative rewards (that is, R(s, a, s')

Question:

Suppose we run value iteration in an MDP with only non-negative rewards (that is, R(s, a, s') ≥ 0 for any (s, a, s')). Let the values on the kth iteration be Vk(s) and the optimal values be V (s). Initially, the values are 0 (that is, V0(s) = 0 for any s).

a. Mark all of the options that are guaranteed to be true. 

(i) For any s, a, s' , V1(s) = R(s, a, s') 

(ii) For any s, a, s' , V1(s) ≤ R(s, a, s') 

(iii) For any s, a, s' , V1(s) ≥ R(s, a, s') 

(iv) None of the above are guaranteed to be true. 

b. Mark all of the options that are guaranteed to be true. 

(i) For any k, s, Vk(s) = V (s) 

(ii) For any k, s, Vk(s) ≤ V (s) 

(iii) For any k, s, Vk(s) ≥ V (s) 

(iv) None of the above are guaranteed to be true.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: