Question: python help: Part 2 : Comparing Control Methods I In Part 2 , you will compare the performance of Monte Carlo control and Q -

python help:

Part

2

: Comparing Control Methods I

In Part

2,

you will compare the performance of Monte Carlo control and Q

-

learning by running both algorithms on a small environment and then analyzing their progress toward finding the optimal policy and optimal state value function. For the sake of comparison, we will use value iteration to find the optimal policy.

________________________________________

2 .

-

Value Iteration

Create a

3

3

instance of the FrozenPlatform environment with sp

_

range

= [0.2, 0.6],

a start position of

1 (

which is the default

),

no holes, and with random

_

state

= 1 .

Create an instance of the DPAgent class with gamma

= 1

and random

_

state

= 1

and use it to run value iteration to find the optimal policy for the environment.

Display the environment, setting fill to shade the the cells according to their value under the optimal policy, and setting contents to display the optimal policy.

________________________________________

[]

________________________________________

2 .

-

MC Control

Create an instance of MCAgent class for the environment created in Step

2 .

,

setting gamma

= 1

and random

_

state

= 1 .

Do NOT set a policy for the agent, instead allowing the initial policy to be randomly generated.

Run Monte Carlo control with

20, 000

episodes, setting epsilon

= 0.1

and alpha

= 0.001 .

Then calculate the mean absolute difference between the optimal state

-

value function found by value iteration and the current Monte Carlo estimate. Print the following message with the blank filled in with the appropriate value, rounded to

2

decimal places

The mean absolute difference in V is

____.

________________________________________

[]

________________________________________

2 .

-

Display the Policy

Display the environment from

2 .

,

setting fill to shade the the cells according to their value under the policy found by MC control, and set contents to display that policy.

________________________________________

[]

________________________________________

2 .

-

History Plot

Replace the first blank in the cell below the instance of MCAgent created in

2 .

.

Set the target parameter of the method to be equal to the state

-

value function for the optimal policy

(

as found by value iteration

)

and then run this cell to show the history plot for the MC estimate of the state value function.

________________________________________

[]

________________________________________

2 .

-

-

Learning

Create an instance of the TDAgent class for the environment created in Step

2 .

,

setting gamma

= 1

and random

_

state

= 1 .

Do NOT set a policy for the agent, instead allowing the initial policy to be randomly generated.

Run Q

-

learning with

20, 000

episodes, setting epsilon

= 0.1

and alpha

= 0.001 .

Then calculate the mean absolute difference between the optimal state

-

value function found by value iteration and the current Q

-

learning estimate. Print the following message with the blank filled in with the appropriate value, rounded to

2

decimal places.

The mean absolute difference in V is

____.

________________________________________

[]

________________________________________

2 .

-

Display the Policy

Display the environment from

2 .

,

setting fill to shade the the cells according to their value under the policy found by Q

-

Learning, and set contents to display that policy.

________________________________________

________________________________________

2 .

-

History Plot

Replace the first blank in the cell below the instance of TDAgent created in

2 .

.

Set the target parameter of the method to be equal to the state

-

value function for the optimal policy

(

as found by value iteration

)

and then run this cell to show the history plot for the Q

-

Learning estimate of the state value function.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

nodes, but at least its bias can be quantified by Markov Chain L. INTRODUCTION analysis and thus can be corrected via appropriate re-weighting The popularity of online social networks (OSNs) in...

Why the Monte Carlo Method is so important today Article ID Dirk P. Kroese The University of Queensland Tim Brereton Ulm University Thomas Taimre The University of Queensland Zdravko I. Botev The...

Randomness can be used to improve the performance of deterministic algorithms which need to make many choices. Rather than repeatedly making fixed, hard-coded choices, a pseudorandom number generator...

Comparing the performance of Decision tree and Neural Networks Machine Learning Algorithms in Detecting Breast-Cancer in medical imaging data Due date: April 17, 2023 Introduction Cancer is a leading...

SUMMARY OF LEARNING OBJECTIVES AND KEY POINTS 1. Identify the basic elements of organizations. Organizations are made up of a series of elements: Designing jobs Grouping jobs Establishing reporting...

Dissertation Topic: "The Effects of Cybersecurity Measures on the Productivity and Well-being of Teleworkers in the Healthcare Industry". Introduction Draft no more than TWO paragraphs here -...

I have attached both the chapters from which you will be answering the following question. Ch 8 (p 387 of text or p 35 of the pdf) ?Exercise D. Prepare the bank reconciliation and any required...

Hello there. I really need help writing this python code for multiprocessing using the rsync command to fix a slow system. I've posted the question before but I guess I didn't attach enough details,...

program in python Randomness can be used to improve the performance of deterministic algorithms which need to make many choices. Rather than repeatedly making fixed, hard-coded choices, a...

I'm in serious need of help in my accounting class (acc205) . I'm falling behind and I need help with this weeks assignment. for week three for the may-june month . can someone please help me ....

Since the early 2000s, the average rate of growth of per capita real GDP in Mozambique has been 4 percent per year, as compared with a growth rate of 9 percent in China. Refer to Table 9-3. If a...

For the data in Table P20.3, use polynomial regression to derive a third-order predictive equation for dissolved oxygen concentration as a function of temperature for the case where chloride...

11 A multiplication can be written as a division. For example, 5 8 = 40 can be written as 40 8 = 5 or 40 5 = 8 a Here is a multiplication: 4 6 = 24 Write it as a division in two different ways. b...

When 3.354 grams of a hydrocarbon, CxHy, were burned in a combustion analysis apparatus, 10.52 grams of CO2 and 4.309grams of H2O were produced. In a separate experiment, the molar mass of the...

How does the Job Level Table differ from the Job Family and Occupation Tables, and how are all Three tables related?

What is the Definition for Third Normal Form?

Provide two examples of a One-To-Many relationship.