Question: 5. Here is a slightly complicated version to the game we discuss in the class. Suppose there are four questions Q1, Q2, Q3 and Q4,


5. Here is a slightly complicated version to the game we discuss in the class. Suppose there are four questions Q1, Q2, Q3 and Q4, which are associated with a reward of $300, $3,000, $30,000, and $90,000, respectively. There is a challenger, denoted by CH. The rule is as follows: (a) CH initially is at the state Q1 and has $0 at hand; (b) When presented a question, CH will have two choices, either to quit or to accept. If to quit, CH will take all the money she has earned so far and game is over. If she accepts and passes the challenges, she will be presented by the next question; if she accepts but fails, she will get $0 and game is over; (c) The game will be over if CH passes the last question Q4 and in that case, CH will earn all the rewards over the four questions. Assume that CH knows in advance that she will pass Q1, Q2, Q3, and Q4 with respective probabilities 4/5, 2/3, 1/2, and 1/2. 5.1. Consider such a simple policy that always accepts the challenge. Please compute the value function V*. Here you should explicitly state the values V*(s) for the four states s = Q1, Q2, Q3 and 24. (10 points) 5.2. Please compute an optimal policy r* and the value function V**. Again, please explicitly state the values V** (s) for the four states s = Q1, Q2, Q3 and 24. (20 points) 5. Here is a slightly complicated version to the game we discuss in the class. Suppose there are four questions Q1, Q2, Q3 and Q4, which are associated with a reward of $300, $3,000, $30,000, and $90,000, respectively. There is a challenger, denoted by CH. The rule is as follows: (a) CH initially is at the state Q1 and has $0 at hand; (b) When presented a question, CH will have two choices, either to quit or to accept. If to quit, CH will take all the money she has earned so far and game is over. If she accepts and passes the challenges, she will be presented by the next question; if she accepts but fails, she will get $0 and game is over; (c) The game will be over if CH passes the last question Q4 and in that case, CH will earn all the rewards over the four questions. Assume that CH knows in advance that she will pass Q1, Q2, Q3, and Q4 with respective probabilities 4/5, 2/3, 1/2, and 1/2. 5.1. Consider such a simple policy that always accepts the challenge. Please compute the value function V*. Here you should explicitly state the values V*(s) for the four states s = Q1, Q2, Q3 and 24. (10 points) 5.2. Please compute an optimal policy r* and the value function V**. Again, please explicitly state the values V** (s) for the four states s = Q1, Q2, Q3 and 24. (20 points)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
