Question: 3. Markov Decision Processes (MDPs) and Reinforcement Learning (RL) (a) Consider the following Markov Decision Process (MDP) of a robot running with an ice-cream: .

3. Markov Decision Processes (MDPs) and

3. Markov Decision Processes (MDPs) and Reinforcement Learning (RL) (a) Consider the following Markov Decision Process (MDP) of a robot running with an ice-cream: . The actions are either to run or walk. The three states are: having one scoop of ice-cream (1S), having two scoops (28), or having none (OS). Walking will always give the robot a reward of +1. Running with one scoop will give a reward of +2, and it might be rewarded with another scoop of ice cream. However running with 2 scoops is kind of risky as it will make the robot drop both scoops; that will result in a reward of -10. Assume no discount of future actions (y = 1.0) and a living reward of zero. 1.0 Walk Walk +1 1.0 1S 28 0.5 +2 Run -2 Fast 0.5 1.0 -10 OS Compute the time limited value for 4 time steps using value iteration. Present the results in tabular format as shown below. 1S 2S VO Vi V2 V3 (8)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!