Question: [9_2_B] Please answer this question step by step 2. Assume a system with four states S1, S2, S3, and S4 with rewards of R1, R2,

[9_2_B]

Please answer this question step by step

[9_2_B] Please answer this question step by step 2. Assume a system

2. Assume a system with four states S1, S2, S3, and S4 with rewards of R1, R2, R3, and R4, respectively. There are three possible actions ai, a2, and az from each state. Use the system to answer the following question about reinforcement learning: (a) What is a policy? (b) What is a Q-function (in Q learning), and how is it related to the policy? (C) Assume that the episode below is executed: Si (action a2) S4 (action a) S3 Which Q values are updated after this episode? What are their new values? You can assume the original Q values are all zero. Use a and y to represent the learning rate and discount factor, respectively. (d) What is the effect of the discount factor in general? 2. Assume a system with four states S1, S2, S3, and S4 with rewards of R1, R2, R3, and R4, respectively. There are three possible actions ai, a2, and az from each state. Use the system to answer the following question about reinforcement learning: (a) What is a policy? (b) What is a Q-function (in Q learning), and how is it related to the policy? (C) Assume that the episode below is executed: Si (action a2) S4 (action a) S3 Which Q values are updated after this episode? What are their new values? You can assume the original Q values are all zero. Use a and y to represent the learning rate and discount factor, respectively. (d) What is the effect of the discount factor in general

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!