Consider a 2-armed bandit instance B in which the rewards from the arms come from uniform...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Consider a 2-armed bandit instance B in which the rewards from the arms come from uniform distributions (recall that the lectures assumed they came from Bernoulli distributions). The rewards of arm 1 are drawn uniformly at random from [a, b], and the rewards of arm 2 are drawn uniformly at random from [c, d], where 0 < a < c < b < d < 1. Observe that this means there is an overlap: both arms produce some rewards from the interval [c, b]. An algorithm L proceeds as follows. First it pulls arm 1; then it pulls arm 2; whichever of these arms produced a higher reward (or arm 1 in case of a tie) is then pulled a further 20 times. In other words, the algorithm performs round-robin exploration for 2 steps and greedily picks an arm for the subsequent exploitation phase, during which that arm is blindly pulled 20 times. What is the expected cumulative regret of L on B after 22 pulls? (If you have worked out an answer but are not sure about it, consider writing a small program to simulate L and run it many times for fixed a, b, c, d. Is the average regret from these runs close to your answer? The program is for your own sake; no need to submit or to explain to us.) Consider a 2-armed bandit instance B in which the rewards from the arms come from uniform distributions (recall that the lectures assumed they came from Bernoulli distributions). The rewards of arm 1 are drawn uniformly at random from [a, b], and the rewards of arm 2 are drawn uniformly at random from [c, d], where 0 < a < c < b < d < 1. Observe that this means there is an overlap: both arms produce some rewards from the interval [c, b]. An algorithm L proceeds as follows. First it pulls arm 1; then it pulls arm 2; whichever of these arms produced a higher reward (or arm 1 in case of a tie) is then pulled a further 20 times. In other words, the algorithm performs round-robin exploration for 2 steps and greedily picks an arm for the subsequent exploitation phase, during which that arm is blindly pulled 20 times. What is the expected cumulative regret of L on B after 22 pulls? (If you have worked out an answer but are not sure about it, consider writing a small program to simulate L and run it many times for fixed a, b, c, d. Is the average regret from these runs close to your answer? The program is for your own sake; no need to submit or to explain to us.)
Expert Answer:
Related Book For
College Algebra
ISBN: 978-0134697024
12th edition
Authors: Margaret L. Lial, John Hornsby, David I. Schneider, Callie Daniels
Posted Date:
Students also viewed these computer engineering questions
-
Consider two different normal distributions for which both the means 1 and 2 and the variances 21 and 22 are unknown, and suppose that it is desired to test the following hypotheses: H0: 21 22, H1:...
-
Consider again Problem 2.1.4 where two cards are drawn from a pack of cards. Is the expected number of hearts drawn larger when the second drawing is made with or without replacement? Does this...
-
The motor at C pulls in the cable with an acceleration ac= (3t)2 m/s, where t is in seconds. The motor at D draws in its cable at aD = 5m/s2. If both motors start at the same instant from rest when d...
-
At Acme Corp., the firm's senior managers approach marketing implementation with the belief that marketing strategy should be developed at the top of the organization and then transmitted to lower...
-
Tomba Corporation had 300,000 ordinary shares outstanding on January 1, 2010. On May 1, Tomba issued 30,000 ordinary shares. (a) Compute the weighted-average number of shares outstanding if the...
-
A radioactive substance decays exponentially. If 500 grams of the substance were present initially and 400 grams are present 50 years later, how many grams will be present after 200 years?
-
Use the following information from the records of Packen Partners to prepare an income statement under the periodic inventory system for the year ended 30 June 2026. Purchases Inventory, 1 July 2025...
-
On January 4, 2014, Glennside Co. paid $235,000 for a computer system. In addition to the basic purchase price, the company paid a setup fee of $1,100, $6,200 sales tax, and $37,200 for a special...
-
Factors which might cause a reduction in management's assigned penalty are called: a. Last chances. b. Second chances. c. Mitigating circumstances. d. Discharge circumstances.
-
Here are some diagnostic plots for the final exam data from Exercise 13. These were generated by a computer package and may look different from the plots generated by the packages you use. (In...
-
A pipe-laying crew consists of two hydraulic excavators, a front-end loader, a trench box, a gravel box, a foreperson, a pipe layer, two equipment operators, and a laborer. The cost of a hydraulic...
-
1. Differentiate between city ledger accounts receivable and house accounts receivable in a hotel. 2. What two procedures can be instituted in a hotel to minimize the dollar amount of house accounts?...
-
On January 2, 2023, Tom Company invested $4,150,000 in Jerry Ltd. for 40% of its outstanding common shares. At this time, the book value (equity) of Jerry Ltd. was $8,400,000. Jerry pays out 25% of...
-
Suppose that (Xn) and (Yn) are sequences of random variables such that (Xn) converges in distribution to N(0,sigma^2) and assume that (Yn) is a consistent estimator of sigma^2. Find the limiting...
-
(a) Draw a tree diagram to illustrate the process creation in Figure Q1. int main() { fork(); fork(); fork(); printf ("Hello UTHM! "); FIGURE Q1 (b) Based on your answer in Q1(a), write the output of...
-
(Choose the correct answer) 2- The temperature coefficient (a) approximation is: a. 0.0875 PC b. 0.0875 C- c. 0.0035 C d. 0.0035 C-* -40 Resistance (ohms) 3 -20 32- 31- 30 29 28 27 25- 24- 23- 22-...
-
Based on the following information, calculate Mary's income taxes. Her income tax rate is 10%. Earned wages of $10,000 Had $600 casualty loss that did not occur in a federally declared disaster area...
-
Charles owns an office building and land that are used in his trade or business. The office building and land were acquired in 1978 for $800,000 and $100,000, respectively. During the current year,...
-
Graph the function. (x) = log 2 (x + 2) - 3
-
Prove each of the following for every positive integer n. Use steps (a)(e) as in Exercises. If a > 1, then a n > a n-1 . Steps Let Sn represent the given statement, and use mathematical induction to...
-
Solve the nonlinear system of equations. Give all solutions, including those with nonreal complex components. y = 6x + x 2 4x - y = -3
-
Why do you think we would want to draw these two extra lines onto the handwritten account?
-
Explain how an ABS is structured.
-
Explain how an ABS CDO is structured.
Study smarter with the SolutionInn App