Question: Consider a Q - learning agent, in a world containing actions { T rain, Rest } and states { W eekday, W eekend } .
Consider a Qlearning agent, in a world containing actions T rain, Rest and states W eekday, W eekend Suppose the Qvalues are currently, at time t as follows.
QW eekday, T rain QW eekday, Rest
QW eekend, T rain QW eekend, Rest
Assume learning rate alpha and discount gamma Suppose that at time t the agent is in state W eekday.
i If the agent uses epsi greedy exploration with epsi what is the probability of choosing action T rain?
ii Suppose instead that the agent uses softmax action selection with tau What is the probability of choosing action T rain?
iii. IfattimettheagentperformsactionTrain,receivesrewardandendsinstate W eekday, how is the table of Qvalues updated using Qlearning?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
