Question 1 ( a ) Consider a simple game where your character is a sailor carrying passengers across a river that separates two towns, A and B Each day you can decide to stay in the town where you are or cross the river once, carrying a number of passengers of your choice, between one and three Each passenger pays a 5 0 point fare before boarding Every time you attempt to cross the river with n passengers, there is a probability n 1 0 of the boat sinking, which ends the game Each day 1 0 points are deducted to cover your living costs, whether you cross the river or not Describe how the game can be modelled as a Markov Decision Process ( MDP ) and, in particular, determine the values of the elements of the tuple used to formally define an MDP ( b ) Explain the following equation in the context of reinforcement learning ( ) max in ( ) ( , ) ( ) ( c ) Consider a reinforcement learning problem modelled as a MDP with deterministic transitions and actions The states are S A , B , C , D , E , G 1 , G 2 , G 3 while the actions are trivially A toA , toB, toC, toD, toE, toG 1 , toG 2 , toG 3 The possible transitions and the corresponding rewards ( if any ) are indicated in the state transition diagram shown in Figure 1 1 below Assuming a discount factor 0 6 , calculate the discounted cumulative value of each state, also providing a brief explanation of the procedure followed Figure 1 1 ( d ) Discuss the role of the discount factor in the context of reinforcement learning In particular, consider your answer to part ( c ) of this question and discuss how the optimal policy from a given state, for example D , changes depending on the choice of the discount factor

The Answer is in the image, click to view ...

Question: Question 1 ( a ) Consider a simple game where your character is a sailor carrying passengers across a river that separates two towns, A

Question

1

(

)

Consider a simple game where your character is a sailor carrying passengers across a

river that separates two towns, A and B

.

Each day you can decide to stay in the town

where you are or cross the river once, carrying a number of passengers of your choice,

between one and three. Each passenger pays a

50

point fare before boarding. Every

time you attempt to cross the river with n passengers, there is a probability n

/ 10

of the

boat sinking, which ends the game. Each day

10

points are deducted to cover your

living costs, whether you cross the river or not. Describe how the game can be

modelled as a Markov Decision Process

(

MDP

)

and, in particular, determine the

values of the elements of the tuple used to formally define an MDP:

.

(

)

Explain the following equation in the context of reinforcement learning:

() =

max

()

[(,) +

()]

(

)

Consider a reinforcement learning problem modelled as a MDP with deterministic

transitions and actions. The states are S

= {

,

,

,

,

,

1,

2,

3}

while the

actions are trivially A

= {

toA

,

toB, toC, toD, toE, toG

1,

toG

2,

toG

3} .

The possible

transitions and the corresponding rewards

(

if any

)

are indicated in the state transition

diagram shown in Figure

1.1

below. Assuming a discount factor

0.6,

calculate the

discounted cumulative value of each state, also providing a brief explanation of the

procedure followed.

Figure

1.1

(

)

Discuss the role of the discount factor in the context of reinforcement learning. In

particular, consider your answer to part

(

)

of this question and discuss how the

optimal policy from a given state, for example D

,

changes depending on the choice of

the discount factor

.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

APA STYLE REFERENCE PAGE Write a 1-2 page paper that answers the following questions: Introduction Who are the stakeholders in this case? What are the interests of the stakeholders? Review the below...

Demand and Supply Discussion Question: Applications-Ubereconomics Please read the article below and respond to the follow-up question: Why Uber Is an Economist's Dream Does 'surge pricing'...

Demand and Supply Discussion Question: Applications-Ubereconomics Please read the article below and respond to the follow-up question: Using Big Data to Estimate Consumer Surplus: The Case of Uber...

This question concerns lexical grammars. (a) Tree Adjoining Grammars contain two types of elementary tree. (i) What are these trees called? [1 mark] (ii) If one were building a grammar for English...

Journal article Analysis on Ethical leadership Please write introduction summary and discussion. ethical leadership Given prominent ethical scandals in virtually every type of organization, the...

please complete the answer sheet using 2016 annual report Financial Statement Analysis Project ANSWER SHEET Name(s): No. Question 1. Who is responsible for the financial statements and maintaining...

What Leaders Really Do Leadership is different from management, but not for the reasons most people think. Leadership isn't mystical and mysterious. It has nothing to do with having "charisma" or...

Question 1 THIS IS A TUTOR QUESTION ONLY AND I AM SEEKING HELP FOR SOLUTION. THIS IS NOT MY QUESTION NEED HELP WITH THE ENTIRE QUESTION.. CAN'T BREAK IT UP .. PLEASE The transcript below is obtained...

This text was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensee....

Suppose firms A and B have identical revenues and operating expenses, so that each has earnings before amortization and taxes of exactly $1 million. Both firms will report amortization of $200,000 on...

It was reported that Exabyte Corporation, a fast growing Colorado marketer of backup tape drives, has decided to purchase key components of its product from others. For example, Sony Corporation of...

On January 1 . 2 0 2 1 , the stockholders' equity section of Kingman Corporation shows commonstock ( $ 5 par valux ) $ 2 , 0 0 0 , 0 0 0 : paidin capital in scess of par value $ 1 , 2 0 0 , 0 0 0 and...

work Exercises (i DATE TRANSACTIONS 20 xx1 April 2 Sold merchandise for cash, $2,550 plus sales tax. The cost of merchandise sold was $1,550. 3 The customer purchasing merchandise for cash on April 2...