Question: Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step

Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step 1: Start-Si, Action = al, Reward =-10. End Step 2: Start-Si, Action-a2, Reward =-10. End-S2 Step 3: Start-S2, Action-ai, Reward = +20. End-Si Step 4: Start-Si, Action-a2, Reward--10. End-S2 1. Perform Q-learning. The discount factor is = 0.5 and the learning rate is = 0.5. Assume that your all Q values are initialized to 0. 2. What is the policy that Q-learning has learned at this point? Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step 1: Start-Si, Action = al, Reward =-10. End Step 2: Start-Si, Action-a2, Reward =-10. End-S2 Step 3: Start-S2, Action-ai, Reward = +20. End-Si Step 4: Start-Si, Action-a2, Reward--10. End-S2 1. Perform Q-learning. The discount factor is = 0.5 and the learning rate is = 0.5. Assume that your all Q values are initialized to 0. 2. What is the policy that Q-learning has learned at this point
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
