Question: solve and explain please cheak this is a COE topics HW Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F}

solve and explain please
solve and explain please cheak this is a COE topics HW Question
cheak this is a COE topics HW
4: (6 Points) Consider a (2X3) game world that has 6 states

Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!