Question: Consider a Markov chain with three states { 1 , 2 , 3 } . In each state, we can choose one of the two

Consider a Markov chain with three states

{1, 2, 3} .

In each state, we can choose one of the two

possible actions

{1, 2} .

The transition probability matrices under the two actions are given below:

P (1) = ([0.5, 0.3, 0.2], [0.1, 0.4, 0.5], [0.3, 0.3, 0.4])

and

P (2) = ([0.3, 0.3, 0.4], [0.5, 0.1, 0.4], [0.2, 0.5, 0.3]) .

The cost for a given

(

state

,

action

)

pair is a Bernoulli random variable. The mean costs are given

below

C = ([0.1, 0.9], [0.8, 0.1], [0, 0])

We are interested in solving the following discounted cost problem

m i n_{} lim_{N} E [_{k = 0}^{N} {0.9}^{k} c (x_{k}, u_{k}) | x_{0} = 1, u_{0} = 1]

where

x_{k}

is the state at time

k, u_{k}

is the action at time

k,

and

denotes a policy.

Assume we do not know the model but are given the following trace

(x_{k}, u_{k}, c (x_{k}, u_{k}))

instead:

(1, 1, 1) (2, 1, 0) (3, 2, 1) (2, 2, 0) .

Consider the Q

-

learning algorithm with

Q_{0} = ([0, 0.5], [0.3, 0], [0.2, 0.1])

and step size

l o n = 0.1 .

Please calculate the

sequence of Q

-

values under Q

-

learning with the trace given above.

Consider a Markov chain with three states {1,2,3}. In each state,

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Consider a Markov chain with the states {1, 2, 3, 4, 5} and the TPM below. Please list the classes and state whether they are recurrent, transient, or absorbing. Calculate the period for each class...

Consider a continuous-time Markov chain with three states 1, 2, 3, 4, 5 and transition rates 912 = 1,913 = 2, 921 =0, 923 = 3, - 931 = 0, 932 = 0. (1) Write the system of ODEs for the corresponding...

Markov chain Irreducibility Consider a Markov chain {Xn : n = 0, 1, 2, ...} with state space {1, 2, 3} and one-step transition probability matrix O NIH NIH P = O 0 O (a) Mark O or X: ( ) The Markov...

Question: Please help keep getting it wrong! According to the WHO MONICA Project the mean blood pressure for people in China is 128 mmHg with a standard deviation of 23 mmHg. Assume that blood...

Question: An oil company produces oil from two wells. Well 1 can produce up to 150,000 barrels per day, and well 2 can produce up to 200,000 barrels per day. It is possible to ship oil directly from...

Transition Probability matrix Consider a Markov chain {Xn : n = 0, 1, 2, ...} with state space {1, 2, 3} and one-step transition probability matrix O NIH NIH P = O 0 O (a) Mark O or X: ( ) The Markov...

Work out please Consider a Markov chain {Xn : n = 0, 1, 2, ...} with state space {1, 2, 3} and one-step transition probability matrix O NIH NIH P = O 0 O (a) Mark O or X: ( ) The Markov chain is...

Solve attached Correlation and Linear Regression It is widely believed that the more education one receives the higher the income earned at the time of first employment and over the course of a...

Probability matrix Consider a Markov chain {Xn : n = 0, 1, 2, ...} with state space {1, 2, 3} and one-step transition probability matrix O NIH NIH P = O 0 O (a) Mark O or X: ( ) The Markov chain is...

87 Markov Chains: Introduction 3.2.4 Suppose X, is a two-state Markov chain whose transition probability matrix is 0 1 - 0 P B B Then, Zn = (Xn-1, X,) is a Markov chain having the four states (0, 0),...

Castillo Wood Products (CWP) manufactures disposable chopsticks for the restaurant industry at its highly automated production facility in China. The main raw material used to produce the chopsticks...

Assume that you are given the following image of the earth without clouds. Describe how you could use texture mapping to create a sphere that looks like the earth. Texture Mapping How you would break...

14.26 Assuming a random effects experiment for Exercise 14.6 on page 596, estimate the variance components for brand of orange juice concentrate, for number of days from when orange juice was blended...

Which of the following is FALSE regarding ARM loans? ARMs with shorter term index means increased risk to borrower because shorter term index rates have more volatility ARMs with more frequent...

. A very sweet pie made from molasses that originated with the Pennsylvania Dutch: a. Mincemeat pie b. Sugar pie c. Shoofly pie d. Lancaster pie

4. Which of the following is not the name of a Native American tribe? a. Seminole b. Apache c. Arapaho d. Illini

8. Explain the relationship between communication and context.