Please indicate whether the following statements are true of false a. If the only difference between two

Question:

Please indicate whether the following statements are true of false a. If the only difference between two MDPs is the value of the discount factor then they must have the same optimal policy. 

b. When using features to represent the Q-function (rather than having a tabular representation) it is possible that Q-learning does not find the optimal Q-function Q

c. For an infinite horizon MDP with a finite number of states and actions and with a discount factor γ, with 0 < γ < 1, value iteration is guaranteed to converge. 

d. When getting to act only for a finite number of steps in an MDP, the optimal policy is stationary. (A stationary policy is a policy that takes the same action in a given state, independent of at what time the agent is in that state.)

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: