You are given an N-sided die, along with a corresponding Boolean mask vector, is_bad_side (i.e., a...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
You are given an N-sided die, along with a corresponding Boolean mask vector, is_bad_side (i.e., a vector of ones and zeros). You can assume that 1 < N ≤ 30, and the vector is_bad_side is also of size N and 1 indexed (since there is no 0 side on the die). The game of DieN is played as follows: 1. You start with 0 dollars. 2. At any time you have the option to roll the die or to quit the game. A. ROLL: a. If you roll a number not in is_bad_side, you receive that many dollars (e.g., you roll the number 2 and 2 is not a bad side -- meaning the second element of the vector is_bad_side is 0, then you receive 2 dollars). Repeat step 2. b. If you roll a number in is_bad_side, then you lose all the money obtained in previous rolls and the game ends. B. QUIT: a. You keep all the money gained from previous rolls and the game ends. Procedure • You will implement your solution using the solve() method in the code below. • Your return value should be the number of dollars you expect to win for a specific value of is_bad_side, if you follow an optimal policy. That is, what is the value of the optimal state-value function for the initial state of the game (starting with 0 dollars)? Your answer must be correct to 3 decimal places, truncated (e.g., 3.14159265 becomes 3.141). • To solve this problem, you will need to determine an optimal policy for the game of DieN, given a particular configuration of the die. As you will see, the action that is optimal will depend on your current bankroll (i.e., how much money you've won so far). • You can try solving this problem by creating an MDP of the game (states, actions, transition function, reward function, and assume a discount rate of y calculating the optimal state-value function. = 1) and then In [5]: #### # DO NOT REMOVE # Versions #numpy==1.18.0 #######7 ###### import numpy as np class MDPAgent (object): definit_(self): pass def solve(self, is_bad_side): """Implement the agent""" pass return True In [ ]: ## DO NOT MODIFY THIS CODE. This code will ensure that your submissic ## will work proberly with the autograder import unittest class TestDieNNotebook (unittest. TestCase): def test_case_1(self): agent MDPAgent () np. testing.assert_almost_equal( agent.solve(is_bad_side= [1, 1, 1, 0, 0, 0]), ) = def test_case_2 (self): = ) 2.583, decimal=3 agent MDPAgent () np. testing.assert_almost_equal ( ) agent.solve( is_bad_side= [1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, ), 7.379, decimal=3 def test_case_3(self): agent MDPAgent () np. testing.assert_almost_equal( agent.solve( is_bad_side=[1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, ), 6.314, decimal=3 unittest.main(argv=[''], verbosity=2, exit-False) You are given an N-sided die, along with a corresponding Boolean mask vector, is_bad_side (i.e., a vector of ones and zeros). You can assume that 1 < N ≤ 30, and the vector is_bad_side is also of size N and 1 indexed (since there is no 0 side on the die). The game of DieN is played as follows: 1. You start with 0 dollars. 2. At any time you have the option to roll the die or to quit the game. A. ROLL: a. If you roll a number not in is_bad_side, you receive that many dollars (e.g., you roll the number 2 and 2 is not a bad side -- meaning the second element of the vector is_bad_side is 0, then you receive 2 dollars). Repeat step 2. b. If you roll a number in is_bad_side, then you lose all the money obtained in previous rolls and the game ends. B. QUIT: a. You keep all the money gained from previous rolls and the game ends. Procedure • You will implement your solution using the solve() method in the code below. • Your return value should be the number of dollars you expect to win for a specific value of is_bad_side, if you follow an optimal policy. That is, what is the value of the optimal state-value function for the initial state of the game (starting with 0 dollars)? Your answer must be correct to 3 decimal places, truncated (e.g., 3.14159265 becomes 3.141). • To solve this problem, you will need to determine an optimal policy for the game of DieN, given a particular configuration of the die. As you will see, the action that is optimal will depend on your current bankroll (i.e., how much money you've won so far). • You can try solving this problem by creating an MDP of the game (states, actions, transition function, reward function, and assume a discount rate of y calculating the optimal state-value function. = 1) and then In [5]: #### # DO NOT REMOVE # Versions #numpy==1.18.0 #######7 ###### import numpy as np class MDPAgent (object): definit_(self): pass def solve(self, is_bad_side): """Implement the agent""" pass return True In [ ]: ## DO NOT MODIFY THIS CODE. This code will ensure that your submissic ## will work proberly with the autograder import unittest class TestDieNNotebook (unittest. TestCase): def test_case_1(self): agent MDPAgent () np. testing.assert_almost_equal( agent.solve(is_bad_side= [1, 1, 1, 0, 0, 0]), ) = def test_case_2 (self): = ) 2.583, decimal=3 agent MDPAgent () np. testing.assert_almost_equal ( ) agent.solve( is_bad_side= [1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, ), 7.379, decimal=3 def test_case_3(self): agent MDPAgent () np. testing.assert_almost_equal( agent.solve( is_bad_side=[1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, ), 6.314, decimal=3 unittest.main(argv=[''], verbosity=2, exit-False)
Expert Answer:
Answer rating: 100% (QA)
Solution To solve this problem you can create an MDP Markov Decision Process of the ... View the full answer
Related Book For
Microeconomics An Intuitive Approach with Calculus
ISBN: 978-0538453257
1st edition
Authors: Thomas Nechyba
Posted Date:
Students also viewed these programming questions
-
With the help of a diagram, explain the UNIX system components. ?
-
The Crazy Eddie fraud may appear smaller and gentler than the massive billion-dollar frauds exposed in recent times, such as Bernie Madoffs Ponzi scheme, frauds in the subprime mortgage market, the...
-
Managing Scope Changes Case Study Scope changes on a project can occur regardless of how well the project is planned or executed. Scope changes can be the result of something that was omitted during...
-
You expect that Bean Enterprises will have earnings per share of $3 for the coming year. Bean plans to retain all of its earnings for the next three years. For the subsequent two years, the firm...
-
Operating profits and losses for the seven industry segments of Foley Corporation are: Penley $ 90 Cheng $ (20) Konami (40) Takuhi 34 KSC 25 Molina 150 Red Moon 50 Based only on the operating profit...
-
Trout Company uses the LIFO method for financial reporting purposes but FIFO for internal reporting purposes. At January 1, 2017, the LIFO reserve has a credit balance of $1,300,000. At December 31,...
-
What is an industry/target market feasibility analysis?
-
1. The first arbitrage opportunity relates to locational arbitrage. Holt has obtained spot rate quotations from two banks in Thailand: Minzu Bank and Sobat Bank both located in Bangkok. The bid and...
-
a) Money markets are used to trade debt securities and instruments with maturities of less than one year. Identify three characteristics of the money market. (6 marks) b) Explain four reasons for the...
-
Sparky's Amusement Park is an entertainment park run by recent college graduates. It caters to young people and others who are young at heart. The owners are very interested in applying what they...
-
Consider the supply chain involved when a customer purchases a cup of coffee at a local caf. Identify the cycles in this supply chain and the location of the push/pull boundary.
-
Which statement is most accurate with regard to seasonal demand? Seasonal demand can be met by maintaining enough manufacturing capacity to meet demand in any period. Seasonal demand can be met by...
-
Predictable variability is change in demand that can be forecast. change in demand that cannot be forecast. change in demand that has been planned. change in demand that has been scheduled.
-
What are some strategic, planning, and operational decisions that must be made by H&M, a Swedish apparel retailer?
-
Describe supply chain coordination and the bullwhip effect, and their impact on supply chain performance.
-
Given the list of strong acids and strong bases below, identify the weak acid. 6 Strong Acids HCIO, perchloric acid HCI HBr HI hydrochloric acid NaOH hydrobromic acid KOH hydroiodic acid HNO, nitric...
-
General Electric Capital, a division of General Electric, uses long-term debt extensively. In a recent year, GE Capital issued $11 billion in long-term debt to investors, then within days filed legal...
-
Consider my wifes tastes for grits and cereal. A: Unlike me, my wife likes both grits and cereal, but for her, averages (between equally preferred bundles) are worse than extremes. (a) On a graph...
-
The economist Jagdish Bhagwati explained in one of his public lectures that international trade causes the wage for child labor to increase in developing countries. He then discussed informally that...
-
Many items are sold not in markets but in auctions where bidders do not know how much others value the object that is up for bid. We will analyze a straightforward setting like this here which...
-
According to the pecking order theory: A. new debt is preferable to new equity. B. new debt is preferable to internally generated funds. C. new equity is always preferable to other sources of capital.
-
According to Modigliani and Millers Proposition II without taxes: A. the capital structure decision has no effect on the cost of equity. B. investment and the capital structure decisions are...
-
Scherer Design Group, LLC (SDG), provides telecommunications services. Chad Schwartz, an SDG employee, sought to obtain an ownership stake in the firm. When this proved unsuccessful, Schwartz quit to...
Study smarter with the SolutionInn App