Question: [9_2_B] Please answer this question step by step 2. Assume a system with four states S1, S2, S3, and S4 with rewards of R1, R2,
[9_2_B]
Please answer this question step by step
![[9_2_B] Please answer this question step by step 2. Assume a system](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/09/66f52cb378838_79566f52cb315172.jpg)
2. Assume a system with four states S1, S2, S3, and S4 with rewards of R1, R2, R3, and R4, respectively. There are three possible actions ai, a2, and az from each state. Use the system to answer the following question about reinforcement learning: (a) What is a policy? (b) What is a Q-function (in Q learning), and how is it related to the policy? (C) Assume that the episode below is executed: Si (action a2) S4 (action a) S3 Which Q values are updated after this episode? What are their new values? You can assume the original Q values are all zero. Use a and y to represent the learning rate and discount factor, respectively. (d) What is the effect of the discount factor in general? 2. Assume a system with four states S1, S2, S3, and S4 with rewards of R1, R2, R3, and R4, respectively. There are three possible actions ai, a2, and az from each state. Use the system to answer the following question about reinforcement learning: (a) What is a policy? (b) What is a Q-function (in Q learning), and how is it related to the policy? (C) Assume that the episode below is executed: Si (action a2) S4 (action a) S3 Which Q values are updated after this episode? What are their new values? You can assume the original Q values are all zero. Use a and y to represent the learning rate and discount factor, respectively. (d) What is the effect of the discount factor in general
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
