Question: Q 9 Policy Iteration: Cycle 1 4 Points Consider the following transition diagram, transition function and reward function for an MDP . Discount Factor,

9

Policy Iteration: Cycle

14

Points

Consider the following transition diagram, transition function and reward function for an MDP

.

Discount Factor,

\ (\

gamma

= 0.5 \)

;

Suppose we are doing policy evaluation, by following the policy given by the left

-

hand side table below. Our current estimates

(

at the end of some iteration of policy evaluation

)

of the value of states when following the current policy is given in the righthand side table.

We recommend you work out the solutions to the following questions on a sheet of scratch paper, and then enter your results into the answer boxes.

Part

1

What is

\ (

_{

+ 1}^{\

} (

) \) ?

Suppose that policy evaluation converges to the following value function,

\ (

_{\

infty

}^{\

} \) .

Now let's execute policy improvement.

Part

2

What is

\ (

_{\

infty

}^{\

} \) (

,

clockwise

) ?

Part

3

: What is

\ (

_{\

infty

}^{\

} \) (

,

counterclockwise

) ?

Part

4

: What is the updated action for state A

?

Clockwise

Counterclockwise

Q 9 Policy Iteration: Cycle 1 4 Points Consider

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Q1. MDPs - Value Iteration (30 points) Part 1 - Cycle. Consider the following transition diagram, transition function and reward func- tion for an MDP Discount Factor, y=0.5 A s S'Tis,a,s") Ris,a,s")...

Part 1 - Cycle. Consider the following transition diagram, transition function and reward func- tion for an MDP. Discount Factor, 9 -0.5 A B -1.0 s a s' Tis,a,s") Rs.1,5) A Clockwise B 1.0 0.0 A...

Q 3 Va 1 ue Iteration: Cyc 1 e We recommend you work out the solutions to the following questions on a sheet of scratch paper, and then enter your results into the answer boxes. Consider the...

Q2. MDPs - Policy Iteration (20 points) Consider the following transition diagram, transition function and reward function for an MDP. Discount Factor, y=0.5 A s a S' Tis,a,s') Ris,a,s") A Clockwise...

1 . Optimal Meal Decisions with MDPs and Q - Learning You are deciding between two meal options for lunch at your favorite cafeteria: Healthy Meal or Junk Food. You enjoy junk food ( action ) , but...

Q2. MDPs - Policy Iteration (20 points) Consider the following transition diagram, transition function and reward function for an MDP. Discount Factor, y = 0.5 A a STs,a,s') Ris,a,s") Clockwise B 1.0...

A discrete sequence {xn} can be converted into a continuous representation x(t) = ts X n= (t n ts) xn, where ts is the sampling period. (a) State two characteristic properties of Dirac's function. [2...

( 1 5 points ) Consider the following Markov Decision Process. Unlike most MDP models where actions have many potential outcomes of varying probability, assume that the transitions are deterministic,...

Attempt the following please; Univariate unconstrained maximization. (10 points) Consider the following maximization problem: max x f (x; x0) = exp((x x0)2) 1. Write down the first order conditions...

Macro...solve all the following 1.5 Options 1. A stock price is currently $50. It is known that at the end of two months it will be either $53 or $48. The risk-free interest rate is 10% per annum...

Both Gmail and iCloud for iTunes were mentioned as examples of cloud computing. Can you describe any other examples of cloud computing?

In "The Born Loser" cartoon strip, Brutus expresses joy over an increase in temperature from 1 to 2. When asked what is so good about 2, he answers, "It's twice as warm as this morning." Why is...

Susan opened a retirement account, and she plans to contribute $ 2 , 0 0 0 every year on April 1 4 , for the next 3 0 years until retirement. To determine how much money she will have upon...

A local personal fitness gym spent $5,000 on a monthly social media ad campaign to attract new customers. After a month, the statistics showed 4 million impressions, and a click through rate of 0.7% o