Question: (40) 1. Consider the following Markov decision process problem in which S = ($1, 82, 83), As, = (an. a12), r(s1, an) = 0, r(1.012)


(40) 1. Consider the following Markov decision process problem in which S = ($1, 82, 83), As, = (an. a12), r(s1, an) = 0, r(1.012) = 2, and p(s2 51, an) = 1, p($1 81. 012) = 1. As, = (a21,a22.023), r($2.a21) = 1, r($2,a2) = 1, r( $2. 023) = 3. p(s3 82, a21 ) = 1, p(s| |82, a22) = 1. p(s2 82. 023) = 1, As, = (031, 032). r(s . 031 ) = 2. r(83, 032) = 4. and p( salsa. a31 ) = 1, and p(s3|$3. (32) = 1 . (20) (a) Classify this Markov decision process problem. Please mention every- thing that applies.(20) (b) Perform the appropriate policy iteration to compute the long-run av- erage reward optimal policy
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
