Question: We decide to structure Jeff s training as an MDP , with Jeff as the agent and the house as the environment: Each room represents

We decide to structure Jeff

s training as an MDP

,

with Jeff as the agent and the house as the environment:

Each room represents a different state s

.

Jeff

s starting position is the front porch, with the Garage as the sole terminal state.

Jeff

s happiness is the reward function. When entering a room, the reward for room s is the reward for going into that room itself plus the reward for hunger in that room:

(

) =

Rroom

(

) +

Rhunger

(

)

The rewards for different rooms are as follows.

Rroom

(

prohibited room

) = - 3

Rroom

(

allowed room

) = 0

Rroom

(

Garage

) = 8

Rhunger

(

any room

) = - 3

For example:

(

Family Room

) =

Rroom

(

Family Room

) +

Rhunger

(

Family Room

) = 0 + (- 3) = - 3

Assume your MDP is undiscounted

(

that is

, \

gamma

= 1.0)

Using the information above, answer the following questions:

)

Define a set of actions A that would allow Jeff to travel throughout the house. Give a brief qualitative description of the transition function P

(

|

,

)

when s

=

Dining Room for each action a in A

.

Actions

(

)

(

|

=

dining room, a

)

for each action a in Actions:

)

Imagine an optimal policy sending Jeff to the Garage that has Jeff go to the Family Room when he is either in the Mud Room or the Dining Room. Why might this same policy send him from the Hallway to the Mud Room rather than to the Dining Room? Hint: consider your answer from

1

.

For questions

1

- 1

d let

s say that we give Jeff a pair of earplugs

()

and he no longer fears the vacuum. That is

,

there is now a

0.0

probability that he thinks he hears a vacuum and goes in the opposite direction.

)

In this revised scenario, is there more than one optimal policy that sends Jeff from the Front Porch to the Garage?

Answer

(

select one

)

Explanation:

)

How can you change either Rhunger

(

any room

)

or Rroom

(

prohibited room

)

such that there is only one optimal policy and it sends Jeff from the Front Porch up to the Family Room and then through the Storage Room into the Garage?

For the remaining question, let Jeff lose his earplugs and he returns to a

0.1

probability that he thinks he hears a vacuum and goes in the opposite direction.

)

Say Jeff turns into a cyborg and no longer feels hunger

(

no penalty for hunger

)

and wants to wander freely while avoiding prohibited rooms. What needs to be changed about Rroom

(

)

and

/

or Rhunger

(

)

to let him wander instead of heading to the Garage

(

no longer a terminal state

) ?

We decide to structure Jeff s training as an MDP

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

I need chapters 18, 19, 20, and 21 for the workbook for Personal Finance by Madura!! Please help!!! Personal Finance, Fifth Edition by Jeff Madura BUILDING YOUR OWN FINANCIAL PLAN WORKBOOK INDEX...

Can anyone help me with this or already have the solution this whole Finance Workbook for Madura Personal Finance, Third Edition by Jeff Madura BUILDING YOUR OWN FINANCIAL PLAN WORKBOOK INDEX Chapter...

Training and Development 7 Blend Images/Blend Images/Superstock Learning Outcomes Define the terms training and development. After reading this chapter, you should be able to do the following:...

Chapter 5 Summary: A. Identify and discuss five or six standards that could be established; include how variance from such standards could be measured. C. From the perspective of a hotel's general...

Chapter Assignments requirements : The purpose of the chapter assignments is to encourage you to undertake additional reading and to demonstrate what you have read by answering the Discussion...

Body: This section will provide a brief evaluation of the profitability, liquidity, asset efficiency and gearing of the company. Reflect on the group discussions to assist in your analysis. Due to...

[Solutions to this assignment must be submitted vio CANVAS prior to midnight on the due dote. These dates and times vory depending on the milestone to be submitted. Submissions up to one day late...

This text was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensee. 1...

Module 3 Case UTILITY ETHICS Background In the Module 3 Case, we will use the Utility Test to inform our understanding of the Enron case study. Required Reading Visit the library, and locate the...

I'm willing to take this before the 19th of may. PROFESSIONAL ISSUES When a company looks too good to be true, it usually is. ; i - I The Rise and Fall of Enron BY C. WILLIAM THOMAS f you''re like...

Design a notch filter with notch frequency of 100 Hz and a quality factor of 10.

What is the largest value of x for which the series Question 7 00 Not yet answered (x 13)" n177" n=1 Marked out of 5 is absolutely convergent? P Flag question Answer:

a manufactoring company focuses on reducing quality it measuresthe time taken to produce and deliver products to customers which presoective of the balanced scorecard does this represent and why is...

Assume no inter-period compounding Interest is 5.5% annual compounded every 6 months What is the future value of a a $175 monthly deposit made for 30 yea 156,249 O 149,866 O 133,886 O 123,000