Question: We decide to structure Jeff s training as an MDP , with Jeff as the agent and the house as the environment: Each room represents

We decide to structure Jeffs training as an MDP, with Jeff as the agent and the house as the environment:
Each room represents a different state s. Jeffs starting position is the front porch, with the Garage as the sole terminal state.
Jeffs happiness is the reward function. When entering a room, the reward for room s is the reward for going into that room itself plus the reward for hunger in that room:
R(s)= Rroom(s)+ Rhunger(s)
The rewards for different rooms are as follows.
Rroom(prohibited room)=-3
Rroom(allowed room)=0
Rroom(Garage)=8
Rhunger(any room)=-3
For example:
R(Family Room)= Rroom(Family Room)+ Rhunger(Family Room)=0+(-3)=-3
Assume your MDP is undiscounted (that is,\gamma =1.0)
Using the information above, answer the following questions:
a)Define a set of actions A that would allow Jeff to travel throughout the house. Give a brief qualitative description of the transition function P(s| s,a) when s=Dining Room for each action a in A.
Actions (A):
P(s|s=dining room, a) for each action a in Actions:
b)Imagine an optimal policy sending Jeff to the Garage that has Jeff go to the Family Room when he is either in the Mud Room or the Dining Room. Why might this same policy send him from the Hallway to the Mud Room rather than to the Dining Room? Hint: consider your answer from 1a.
For questions 1c-1d lets say that we give Jeff a pair of earplugs () and he no longer fears the vacuum. That is, there is now a 0.0 probability that he thinks he hears a vacuum and goes in the opposite direction.
c)In this revised scenario, is there more than one optimal policy that sends Jeff from the Front Porch to the Garage?
Answer (select one):
Explanation:
d)How can you change either Rhunger(any room) or Rroom(prohibited room) such that there is only one optimal policy and it sends Jeff from the Front Porch up to the Family Room and then through the Storage Room into the Garage?
For the remaining question, let Jeff lose his earplugs and he returns to a 0.1 probability that he thinks he hears a vacuum and goes in the opposite direction.
e)Say Jeff turns into a cyborg and no longer feels hunger (no penalty for hunger) and wants to wander freely while avoiding prohibited rooms. What needs to be changed about Rroom(s) and/or Rhunger(s) to let him wander instead of heading to the Garage (no longer a terminal state)?
We decide to structure Jeff s training as an MDP

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!