Question: Question 1 ( a ) Consider a simple game where your character is a sailor carrying passengers across a river that separates two towns, A
Question
a Consider a simple game where your character is a sailor carrying passengers across a
river that separates two towns, A and B Each day you can decide to stay in the town
where you are or cross the river once, carrying a number of passengers of your choice,
between one and three. Each passenger pays a point fare before boarding. Every
time you attempt to cross the river with n passengers, there is a probability n of the
boat sinking, which ends the game. Each day points are deducted to cover your
living costs, whether you cross the river or not. Describe how the game can be
modelled as a Markov Decision Process MDP and, in particular, determine the
values of the elements of the tuple used to formally define an MDP:
b Explain the following equation in the context of reinforcement learning:
max
in
c Consider a reinforcement learning problem modelled as a MDP with deterministic
transitions and actions. The states are S A B C D E G G G while the
actions are trivially A toA toB, toC, toD, toE, toG toG toG The possible
transitions and the corresponding rewards if any are indicated in the state transition
diagram shown in Figure below. Assuming a discount factor calculate the
discounted cumulative value of each state, also providing a brief explanation of the
procedure followed.
Figure
d Discuss the role of the discount factor in the context of reinforcement learning. In
particular, consider your answer to part c of this question and discuss how the
optimal policy from a given state, for example D changes depending on the choice of
the discount factor
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
