Consider the reinforcement learning problem posed by the gridworld example shown in Figure 5, and assume...

Fantastic news! We've Found the answer you've been seeking!

Question:

Transcribed Image Text:

Consider the reinforcement learning problem posed by the gridworld example shown in Figure 5, and assume that we want to use approximate Q-learning to find a policy for the agent: instead of keeping track all Q-values Q(s, a) in a table (we will call this tabular Q-learning), we use machine learning techniques to estimate Q(s, a) from data. We gather training data by letting the agent take sequences of action, and observe the outcome. 3 2 1 START 1 2 (a) 3 Figure 5: An agent is presented with a sequential decision making problem in a 4x3 gridworld (figure a). Starting from the START state in (1,1), its goal is to reach one of the exit states (4,3) or (4,2) while maximizing utility. At each timestep, the agent takes an action to move left, right, up or down. Given an action, the outcome is probabilistic: the agent might move in the intended direction or in one of the two directions that are at a right angle from the intended direction (figure b). The reward and transition functions of the underlying MDP are unknown. 1. Give one example of feature that can be used to characterize Q-states to predict Q(s, a) using a linear model. (2 points) 2. Estimating Q(s, a) using neural networks is a better strategy than tabular Q-learning in problems where the number of states and actions is small. True or False? provide a brief justification. (3 points) 3. Estimating Q(s, a) using neural networks can improve over tabular Q-learning by capturing better generalizations across states. True or False? Provide a brief justification. (3 points) 4. If we use a linear model to estimate Q(s, a) and obtain a high error on the training data, adding features that capture properties of states s is a good strategy to improve the training error. True or False? Provide a brief justification. (3 points) 5. If we use a linear model to estimate Q(s, a) and obtain a high error on the training data, replacing the linear model with a neural network with one hidden layer (and keeping everything else constant) will not help improve the training error. True or False? Provide a brief justification. (3 points) Consider the reinforcement learning problem posed by the gridworld example shown in Figure 5, and assume that we want to use approximate Q-learning to find a policy for the agent: instead of keeping track all Q-values Q(s, a) in a table (we will call this tabular Q-learning), we use machine learning techniques to estimate Q(s, a) from data. We gather training data by letting the agent take sequences of action, and observe the outcome. 3 2 1 START 1 2 (a) 3 Figure 5: An agent is presented with a sequential decision making problem in a 4x3 gridworld (figure a). Starting from the START state in (1,1), its goal is to reach one of the exit states (4,3) or (4,2) while maximizing utility. At each timestep, the agent takes an action to move left, right, up or down. Given an action, the outcome is probabilistic: the agent might move in the intended direction or in one of the two directions that are at a right angle from the intended direction (figure b). The reward and transition functions of the underlying MDP are unknown. 1. Give one example of feature that can be used to characterize Q-states to predict Q(s, a) using a linear model. (2 points) 2. Estimating Q(s, a) using neural networks is a better strategy than tabular Q-learning in problems where the number of states and actions is small. True or False? provide a brief justification. (3 points) 3. Estimating Q(s, a) using neural networks can improve over tabular Q-learning by capturing better generalizations across states. True or False? Provide a brief justification. (3 points) 4. If we use a linear model to estimate Q(s, a) and obtain a high error on the training data, adding features that capture properties of states s is a good strategy to improve the training error. True or False? Provide a brief justification. (3 points) 5. If we use a linear model to estimate Q(s, a) and obtain a high error on the training data, replacing the linear model with a neural network with one hidden layer (and keeping everything else constant) will not help improve the training error. True or False? Provide a brief justification. (3 points)

Related Book For answer-question

answer-question

Elementary Statistics

Elementary Statistics

ISBN: 9780321836960

12th Edition

Authors: Mario F. Triola

See More Books

Posted Date: Jan 09, 2024 01:09 AM