Question: Q1. In micro-blackjack, you repeatedly draw a card (with replacement) that is equally likely to be a 1, 2, or 3. You can either Draw

Q1. In micro-blackjack, you repeatedly draw a card (with replacement) that is equally likely to be a 1, 2, or 3. You can either Draw or Stop if the total score of the cards you have drawn is less than 6. If your total score is 5 or higher, the game ends, and you receive a utility of 0. When you Stop, your utility is equal to your total score (up to 4), and the game ends. When you Draw, you receive no utility. There is no discount (g = 1). Let's formulate this problem as an MDP with the following states: 0, 1, 3, 4 and a Done state, for when the game ends, state S = {0,1,2,3,4, Done}. There are two actions, Stope and Draw, action A = {st, dr}.

What are the transition functions and the reward functions for this MDP? (10 points)

Fill in the following table of value iteration values of V(2). Show your works. (20 points)

States

0

1

2

3

4

V(0)

0

0

0

0

0

V(1)

0

1

2

3

4

V(2)

You should have noticed that value iteration converged above. What is the optimal policy for the MDP based on above iteration result? Fill the table below. (10 points)

States

0

1

2

3

4

p*

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!