Question: For the multi - arms bandit problem we discussed in the class. Suppose that we get return Gn at n - th time we do
For the multiarms bandit problem we discussed in the class. Suppose that we get return Gn at nth time we do action a and EGn r n Let Qn be our estimates of r after we do action a the nth time, and we have the following update rule QnQn nGn Qn Q We define Vn E Qn raDecreasing step size Let n n show that i points Qn n Pni Gi n ii points limn Vn bConstant step size Let n show that i points Vn Vn VarGn where VarGn E Gn r ii points limn Vn VarGn
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
