Question: In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return.

In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a number

a, 0 < a < 1, and try to choose a policy so as to maximize E[o a'R(X,, a)]. (That is, rewards at time n are discounted at rate a".) Suppose that the initial state is chosen according to the probabilities

b. That is, P(X = i) =

b, i = 1,..., n

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Introduction To Probability Statistics Questions!