Question: Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step

 Question 7 [15 pt: Consider a system with two states and

Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step 1: Start-Si, Action = al, Reward =-10. End Step 2: Start-Si, Action-a2, Reward =-10. End-S2 Step 3: Start-S2, Action-ai, Reward = +20. End-Si Step 4: Start-Si, Action-a2, Reward--10. End-S2 1. Perform Q-learning. The discount factor is = 0.5 and the learning rate is = 0.5. Assume that your all Q values are initialized to 0. 2. What is the policy that Q-learning has learned at this point? Question 7 [15 pt: Consider a system with two states and two actions. You perform actions and observe the rewards and transitions listed below Step 1: Start-Si, Action = al, Reward =-10. End Step 2: Start-Si, Action-a2, Reward =-10. End-S2 Step 3: Start-S2, Action-ai, Reward = +20. End-Si Step 4: Start-Si, Action-a2, Reward--10. End-S2 1. Perform Q-learning. The discount factor is = 0.5 and the learning rate is = 0.5. Assume that your all Q values are initialized to 0. 2. What is the policy that Q-learning has learned at this point

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!