Question: 1 Consider a game where a frog repeatedly jumps a random number of steps that is equally likely to be 2 , 3 , or

1 Consider a game where a frog repeatedly jumps a random number of steps that is equally likely to be 2,3, or 4. The frog can either Jump or Stop if the total number of steps is less than 6. If the total step is 6 or higher, the game automatically ends, and the frog receives a reward of 0. When the frog Stops, the reward is equal to the total steps (up to 5), and the game ends. There is no reward for the Jump action. Formulate this problem as an MDP with the states {0,2,3,4,5, Done}.
a) What is the transition function p(s s, a) for this MDP?
b) What is the reward function for this MDP?
c) Perform value iteration for 4 iterations with =1 and mention the value function as:
States
0
2
3
4
5
Done
V0
0
0
0
0
0
0
V1
V2
V3
V4
d) Based on the above value function after 4 iterations, what is the current best policy?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!