Question: Consider the simple MDP shown below. Starting from state s 1 , the agent can move to the right ( a 0 ) or left

Consider the simple MDP shown below. Starting from state s

1,

the agent can move to the right

(

0)

or left

(

1)

from any state si

.

Actions are deterministic

(

.

.

choosing a

1

at state s

2

results in

transition to state s

1) .

Taking any action from the goal state G earns a reward of r

= + 1

and the

agent stays in state G

.

Otherwise, each move has zero reward

(

= 0) .

Assume a discount factor

\

gamma

< 1 . "

% "

'

), = 0

", = 0

= 1

), = 0), = 0

", = 0

", = 0

(

)

What is the optimal action at any state si

=

?

Find the optimal value function for all states

si and the goal state G

. [5

pts

]

(

)

Does the optimal policy depend on the value of the discount factor

\

gamma

?

Explain your answer.

[5

pts

]

(

)

Consider adding a constant c to all rewards. Find the new optimal value function for all states

si and the goal state G

.

Does adding a constant reward c change the optimal policy? Explain

your answer.

[5

pts

]

(

)

After adding a constant c to all rewards now consider scaling all the rewards by a constant a

(

.

.

rnew

=

(

+

rold

)) .

Find the new optimal value function for all states si and the goal

state G

.

Does that change the optimal policy? Explain your answer, If yes, give an example

of a and c that changes the optimal policy.

[5

pts

]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Consider the simple MDP shown below. Starting from state s 1 , the agent can move to the right ( a 0 ) or left ( a 1 ) from any state si . Actions are deterministic ( e . g . choosing a 1 at state s...

MDP is an acronym for Markov Decision Process. This problem is about reinforcement learning and .MDP Please need help with some reinforcement learning and Markov Decision Process. Advance probability...

Please explain how did you came up with the answer for a thumbs up! These questions are based on the Markov Decision Process, reinforcement learning, and statistics. Thank you! Consider the simple...

Value Iteration ( 2 5 points ) Consider the gridworld MDP shown to the right. The terminal state ( 3 , 2 ) has a reward of + 2 0 and the non - terminal state to the left of it has a reward of - 1 0 ....

Can you simulate this lab plzz 1. Semiconductor Diodes - Brief Introduction and Terminology A semiconductor diode is a two-terminal device formed by the junction of two dissimilar materials. The two...

The operating system typically provides each process with the illusion that it runs in a contiguous piece of memory. State the problem of external fragmentation in memory where processes have...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

io (a) Give the general formula for estimating transition probabilities from training data. Provide the full transition matrix A for this HMM based on the training data shown. [6 marks] (b) Give the...

Consider the continuing MDP shown on to the right. The only decision to be made is that in the top state, where two actions are available, left and right. The numbers show the rewards that are...

14. What was a key design change for HFC-134a A/C systems versus CFC-12 A/C systems? For hint, click link below: cything F2 Click Here 4 A. No need for desiccant in the system B. More porous hoses...

Mariano Corporation sells 7,000 units of inventory during the first year of operations for $300 each. The selling price includes a oneyear warranty on parts. It is estimated that 4% of the units will...

What is the current rate for a one year treasury bond year 3 rate 1 3

The Pawlson Company's year-end balance sheet is shown below. Its cost of common equity is 16%, its before-tax cost of debt is 11%, and its marginal tax rate is 40%. Assume that the firm's long-term...

3. Assess the quality or value of communication by examining its six characteristics

5. Describe the visual representations, or models, of communication

4. Define what communication scholars consider to be competent communication