Question: DO NOT PROVIDE CODE. SOLVE AND SHOW MATH. Tammy is training for a marathon, which takes place in three weeks. She needs to train for
DO NOT PROVIDE CODE. SOLVE AND SHOW MATH. Tammy is training for a marathon, which takes place in three weeks. She needs to train for the
marathon, however, she's recovering from an injury. Each week, she can either train or rest.
If she trains while injured, she gets a reward of
If she trains while fully recovered, she gets a reward of
If she rests, she gets a reward of
When Tammy trains while injured, she has a chance of being fully recovered the following week. If
she decides to rest instead, this probability increases to Suppose that once she recovers, she does not
get injured again. Tammy does not earn any additional rewards after week regardless of her being fully
recovered or not.
a points Describe this problem as an MDP Describe the states, actions at each state, and transition
model. Indicate which states are terminal nodes, and the reward function in each state.
Hint: When modeling the problem, it is useful to have states representing Tammy being recovered or not
on each week except for week she cannot be recovered then and one terminal state for the marathon
itself.
b points What are the possible deterministic policies are there for this MDP
c points Suppose Tammy decides to train every week. What is Tammy's value at every state for this
policy as a function of the discount factor
d points Compare the reward of the policy of training every week, to the policy of resting on the first
week, and then training on weeks and For what values of the discount factor is one policy better
than the other?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
