Consider a two state Markov decision process ( MDP ) with state s 1 and state s 2 In state s 1 , the decision maker chooses either action a 1 or action a 2 In state s 2 , only action a 3 is available The immediate returns and transition probabilities are as follows r ( s 1 , a 1 ) 4 , r ( s 1 , a 2 ) 1 0 , r ( s 2 , a 3 ) 2 , p ( s 1 s 1 , a 1 ) p ( s 2 s 1 , a 1 ) 0 5 , p ( s 2 s 1 , a 2 ) 1 , p ( s 1 s 2 , a 3 ) 0 2 , p ( s 2 s 2 , a 3 ) 0 8 ( a ) Solve the three periods problem with terminal reward r 4 ( s 1 ) r 4 ( s 2 ) 0 to maximize the expected total rewards and find the optimal decision rule in each period ( b ) Consider the infinite horizon discounted MDP with discounted factor lambda 0 5 Calculate the expected total discounted reward of a stationary policy delta infty with delta ( s 1 ) a 1 and delta ( s 2 ) a 3 Also, use the optimality equations to check if it is the optimal policy

The Answer is in the image, click to view ...

Question: Consider a two - state Markov decision process ( MDP ) with state s 1 and state s 2 . In state s 1 ,

Consider a two

-

state Markov decision process

(

MDP

)

with state s

1

and state s

2 .

In state s

1,

the decision maker chooses either action a

1

or action a

2

; In state s

2,

only action a

3

is available. The immediate returns and transition probabilities are as follows.

(

1,

1) = 4,

(

1,

2) = 10,

(

2,

3) = 2,

(

1 |

1,

1) =

(

2 |

1,

1) = 0.5,

(

2 |

1,

2) = 1,

(

1 |

2,

3) = 0.2,

(

2 |

2,

3) = 0.8 .

(

)

Solve the three

-

periods problem with terminal reward r

4 (

1) =

4 (

2) = 0

to maximize the expected total rewards and find the optimal decision rule in each period.

(

)

Consider the infinite

-

horizon discounted MDP with discounted factor

\

lambda

= 0.5 .

Calculate the expected total discounted reward of a stationary policy

\

delta

\

infty with

\

delta

(

1) =

1

and

\

delta

(

2) =

3 .

Also, use the optimality equations to check if it is the optimal policy.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

A soccer robot R is on a fast break toward the goal, starting in position 1. From positions 1 through 3, it can either shoot (S) or dribble the ball forward (D). From 4 it can only shoot. If it...

Problem 2 . ( 1 5 points ) Consider the following deterministic Markov Decision Process ( MDP ) , describing a simple robot grid world with 6 states and 4 actions RIGHT, LEFT, UP , and DOWN ( not all...

SELECT ALL THAT ARE TRUE Consider the following Markov Decision Process (MDP): MDP with 4 states (rewards for each action are indicated on the arrow) There are 4 states A, B, C, and D. We can move up...

The illustrated model in Figure ( 2 ) has the states \ ( C , B \ ) , and \ ( A \ ) with a factor \ ( \ gamma = 0 . 7 \ ) . Action rewards are the negative and positive integers, while the transition...

Consider the following Markov decision process (MDP). reward 1 reward=1 reward=1 reward=10 S1 S2 (S3) S4 S5 reward=1 reward 1 reward=1 reward=10 We have five states representing steps along one...

Question 1 ( a ) Consider a simple game where your character is a sailor carrying passengers across a river that separates two towns, A and B . Each day you can decide to stay in the town where you...

True or False? [ 2 points ] The difference between planning in a known Markov Decision Process ( MDP ) and Reinforcement Learning ( RL ) is that in RL the agent doesn t know what the current state is...

Question 1 (MDP-Policy iteration and iteration) [50 points -each part 12.5 points]: Assume a casino has hired you to execute the analysis of a new gane. On cach round of the game, the player has the...

A casino is considering adding a new game to their collection, but need to analyze it before releasing it on their floor. They have hired you to execute the analysis. On each round of the game, the...

The Markov Decision Process (MDP) is depicted in the following figure. S and A are sets of states and actions, respectively. The reward for the agent receives in each state is also listed in the...

You are a risk manager for a manufacturing company, and you are evaluating the risk associated with a new production line. The production line has a fixed cost of $500,000 and a variable cost of $50...

Assume that manufacturing overhead for Goldratt Company consisted of the following activities and costs: Setup (1,000 setup hours) $136,000 Production scheduling (400 batches) 70,000 Production...

Current Attempt in Progress Crane Corporation began its latest fiscal year on January 1 , 2 0 2 3 , with 5 2 0 , 0 0 0 common shares outstanding. During the year, the following events occurred: - On...

Write robot programming methods for manufacturing plant include material handling and processing operations tasks give examples

=+4. Go to Timberlands main Web site (http://www.timberland.com/) and also the companys volunteer Web site (http://www.timberland.com/corp/index.jsp?

=+ Would you consider participating? Would it change your feelings about working for the firm?

=+ Whose interests do they serve? Which of their two main responsibilities do you think is the most important? Why?