Question: Consider a Markov chain with three states { 1 , 2 , 3 } . In each state, we can choose one of the two

Consider a Markov chain with three states {1,2,3}. In each state, we can choose one of the two
possible actions {1,2}. The transition probability matrices under the two actions are given below:
P(1)=([0.5,0.3,0.2],[0.1,0.4,0.5],[0.3,0.3,0.4]) and P(2)=([0.3,0.3,0.4],[0.5,0.1,0.4],[0.2,0.5,0.3]).
The cost for a given (state, action) pair is a Bernoulli random variable. The mean costs are given
below
C=([0.1,0.9],[0.8,0.1],[0,0])
We are interested in solving the following discounted cost problem
minlimNE[k=0N0.9kc(xk,uk)|x0=1,u0=1]
where xk is the state at time k,uk is the action at time k, and denotes a policy.
Assume we do not know the model but are given the following trace (xk,uk,c(xk,uk)) instead:
(1,1,1)(2,1,0)(3,2,1)(2,2,0).
Consider the Q-learning algorithm with Q0=([0,0.5],[0.3,0],[0.2,0.1]) and step size lon=0.1. Please calculate the
sequence of Q-values under Q-learning with the trace given above.
 Consider a Markov chain with three states {1,2,3}. In each state,

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!