Objective Reinforcement Learning Homework 3 Model Free Monte Carlo Prediction In this homework assignment, you will apply the Monte Carlo prediction method to estimate the state values for a four state problem You will be provided with four episodes Your task is to calculate the state values using the Monte Carlo method with a specified discount factor ( gamma ) and initial values for the states Problem Setup States ( S ) Four states, labeled as S 1 , S 2 , S 3 , and S 4 Rewards ( R ) Provided within each episode, including a final reward Discount Factor ( gamma ) 0 9 Initial State Values ( V ) V ( S 1 ) 0 V ( S 2 ) 0 V ( S 3 ) 0 V ( S 4 ) 0 Episodes S 1 , 0 , S 2 , 1 , S 3 , 0 , S 4 , 1 0 S 1 , 0 , S 2 , 0 , S 2 , 0 , S 3 , 0 , S 4 , 5 S 1 , 0 , S 1 , 1 , S 2 , 0 , S 3 , 0 , S 4 , 8 S 1 , 0 , S 2 , 0 , S 2 , 0 , S 3 , 1 , S 4 , 1 2 Tasks 1 Calculate the returns ( G ) for each state in each episode 2 Use the Every Visit Monte Carlo method to update the state values ( V ) based on the returns and the discount factor ( gamma ) 3 Calculate the updated values for each state after processing all four episodes

The Answer is in the image, click to view ...

Question: Objective Reinforcement Learning Homework 3 : Model - Free Monte Carlo Prediction In this homework assignment, you will apply the Monte Carlo prediction method to

Objective

Reinforcement Learning

Homework

3

: Model

-

Free Monte Carlo Prediction

In this homework assignment, you will apply the Monte Carlo prediction method to estimate the state values for a four

-

state problem. You will be provided with four episodes. Your task is to calculate the state values using the Monte Carlo method with a specified discount factor

(

gamma

)

and initial values for the states.

Problem Setup

-

States

(

)

: Four states, labeled as S

1,

2,

3,

and S

4 .

-

Rewards

(

)

: Provided within each episode, including a final reward.

-

Discount Factor

(\

gamma

)

0.9

-

Initial State Values

(

)

-

(

1) = 0 -

(

2) = 0 -

(

3) = 0 -

(

4) = 0

Episodes

-

1, 0,

2, 1,

3, 0,

4, 10

-

1, 0,

2, 0,

2, 0,

3, 0,

4, - 5 -

1, 0,

1, 1,

2, 0,

3, 0,

4, 8 -

1, 0,

2, 0,

2, 0,

3, 1,

4, 12

Tasks

1 .

Calculate the returns

(

)

for each state in each episode.

2 .

Use the Every

-

Visit Monte Carlo method to update the state values

(

)

based on the returns and the discount factor

(\

gamma

) .

3 .

Calculate the updated values for each state after processing all four episodes.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Objective Reinforcement Learning Homework 3 : Model - Free Monte Carlo Prediction In this homework assignment, you will apply the Monte Carlo prediction method to estimate the state values for a four...

Exercises Chapter 2 2.1 Marginal and conditional probability: The social mobility data from Section 2.5 gives a joint probability distribution on (Y1 , Y2 )= (father's occupation, son's occupation)....

1. Texas Roadhouse (TXRH) is rapidly expanding into new markets and had sales of $1,263M in 2012. Suppose you expect sales to grow at a 15% rate in 2013, but this rate will slow by 2% per year to a...

1 2 3 4 7 8 9 12 13 14 15 16 17 18 19 20 21 22 23 24 28 29 30 31 38 40 41 44 47 48 49 50 51 62 63 64 66 67 68 69 70 71 73 74 76 77 78 79 80 81 82 85 86 87 88 89 90 91 92 93 94 95 99 100 101 104 105...

This is a .ipynb file, please fill out the AMC(S0,K,r,sigma, N_sim, T, dt, N_basis=6) function based on the instruction (coding base question, European Put price for comparison part is only for...

Obtain the price of an option on an underlying with stochastic volatility through numerical methods, and verify that the stochastic volatility model helps . . Apply the Monte Carlo simulation method....

Practice Problems Implied volatility puzzle and stochastic volatility This exercise is intended to show how to obtain the price of an option on an underlying with stochastic volatility through...

Distinguish between human data-entry devices and source-data automation?

1. From chapter 13, figure. Imagine for a minute that the organism in this illustration is E. coli O157:H7. What would be one reason not to treat a patient having this infection with powerful...

Complete the following journal items and values that should be in the RED X areas Armstrong County established a County Office Building Construction Fund to account for a project expected to take...

Skills necessary for success in marketing include Multiple select question. analytical thinking advertising industry experience infallibility ability to work with others

Explain the difference between access discrimination and treatment discrimination.

What are the key strengths of Global Care in implementing its management framework in Mexico? How did the firm leverage these advantages given the current and historical state of HRM in Mexico? Draw...

Understand the challenges of managing diversity, equity, and inclusion initiatives.