Question: ( X points ) Consider the deterministic reinforcement environment drawn below ( let gamma = 0 . 5 ) . The agent can choose

( X points) Consider the deterministic reinforcement environment drawn below (let \gamma =0.5). The agent can choose to follow any outgoing edge from any node and will arrive at the other end of the edge 100% of the time. The numbers on the edges indicate the immediate rewards. Once the agent reaches the 'end' state the agent is magically transported to the 'start' state. A one-step, tabular, Q-learner with \alpha =1 follows the path start -> b -> end. Compute the values of all entries in the Q table that change. Show your work. Assume that for all legal actions the initial values in the Q table are 6. When writing Q(s, a) let the action be the name of the target state. For example, Q( start, b) denotes starting in the start state and taking the action that will move to state b.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!