Question: 77. In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted

77. In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a numberI mage, and try to choose a policy so as to maximize

(that is, rewards at time n are discounted at rate Image). Suppose that the initial state is chosen according to the probabilitiesI mage. That is, Image For a given policy β letI mage denote the expected discounted time that the process is in state j and action a is chosen. That is, Image where for any event A the indicator variableI mage is defined by Image

(a) Show that Image or, in other words, Image is the expected discounted time in state j under β.

(b) Show that

(4.38)

Image Hint: For the second equation, use the identity Image Take expectations of the preceding to obtain

(c) Let Image be a set of numbers satisfying Image Argue that Image can be interpreted as the expected discounted time that the process is in state j and action a is chosen when the initial state is chosen according to the probabilities Image and the policy β, given by Image is employed.

Hint: Derive a set of equations for the expected discounted times when policy β is used and show that they are equivalent to Eq.

(4.38).

(d) Argue that an optimal policy with respect to the expected discounted return criterion can be obtained by first solving the linear program Image and then defining the policy Image by Image where the Image are the solutions of the linear program.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Introduction To Probability Statistics Questions!

77. In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a...

54. In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a...

In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a number ,...

77. In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a...

Thank u guys:) This is the reference pages but the questions are the last 3 pictures . . . . Craft - Mason Wage rate VET $29.00 Hours worked 50 hours per week for 20 weeks and 40 hours per week for...

ATC 14-1 (Pg. 686) 1.(Follow the cash) In a narrative format, answer the questions posed in the case. 2.What is meant by "presentation of financial statement information in common-size amounts rather...

CoursHeroTranscribedText: A 5-year loan in the amount of $48,000 is to be repaid in equal annual payments. What is the remaining principal balance after the third payment if the interest rate is 5...

Whole Foods is a high-end grocery store specializing in natural and organic foods, as well as gourmet take-out foods. Whole Food stores are frequently located in up-scale neighborhoods. WinCo Foods...

writing assignment, please write two short essays answering the following prompts. 1. Discuss the ethical problems with the Kramer, et al article, "Experimental evidence of massive-scale emotional...

If ( 1 ) SR is less than MR or ( 2 ) SR is greater than MR , Bond is issued respectively at a: Multiple Choice Discount; Premium Premium; Discount Par; Discount

Find the indicated quantities for the appropriate sequences. a 1 = 80, a n = 25, S n = 220, d = ?

A fair coin is independently flipped n times, k times by A and n - k times by B. Show that the probability that A and B flip the same number of heads is equal to the probability that there are a...

Find the expected number of games that are played when (a) i = 2. (b) i = 3. In both cases, show that this number is maximized when p = 1/2.

If i 4, find the probability that a total of 7 games are played. Also show that this probability is maximized when p = 1/2.

Rida Incorporated is preparing its direct materials budget for the second quarter. It budgets production of 240,000 units in the second quarter and 52,500 units in the third quarter. Each unit...

Ryder Corporation has the following data as of December 31, 2024: Total Current Liabilities Total Current Assets Long-term Liabilities $ 58,420 Total Stockholders' Equity 48,100 Other Assets 167,960...

Production budget Junior Pro Striker 7,500 units 20,500 units Both rackets are produced in two departments, Forming and Assembly. The direct labor hours required for each racket are estimated as...