Question: A.) Consider a UAV performing reconnaissance in a 4 4 grid of sectors as depicted in the figure above. The UAV has the ability to

A.) Consider a UAV performing reconnaissance in a 4 4 grid of sectors as depicted in the figure above. The UAV has the ability to fly north, south, west and east with each action moving it by one sector. Each action is successful in its intended direction by a probability of 0.85. Remaining probability is divided equally between the two directions perpendicular to its intended action. The UAV prefers the sectors with a green circle and would like to avoid the red sector. The patterned sector is out of bounds. Write a program in C or C++ that models this problem as a MDP consisting of a tuple of states, actions, transition and reward functions. Assign a reward of +1 to the sectors with a green circle and a cost of 1 to the red sector. All other sectors have a cost of 0.05.
B.) In the program, implement policy iteration for MDPs whose algorithm is provided in Fig. 17.7 (algorithm) of above image, for the optimality criterion of discounted infinite horizon with a discount factor of = 0.99. Display the converged policy of the UAV as output. Use the policy to generate a trajectory from the start state (0,0), and determine if it leads to any of the green sectors. Show this trajectory in the text.
function POLICY-ITER TION (mdp) returns a policy inputs: mdp, an MDP with states S, actions A(s), transition model P(ss,a) local variables: U, a vector of utilities for states in S, initially zero ,apolicyvectorindexedbystate,initiallyrandom repeat U POLICY-EVALUATION (,U,mdp) unchanged? true for each state s in S do if maxaA(s)sP(ss,a)U[s]>sP(ss,[s])U[s] then do [s]aA(s)argmaxsP(ss,a)U[s] unchanged? false until unchanged? return function POLICY-ITER TION (mdp) returns a policy inputs: mdp, an MDP with states S, actions A(s), transition model P(ss,a) local variables: U, a vector of utilities for states in S, initially zero ,apolicyvectorindexedbystate,initiallyrandom repeat U POLICY-EVALUATION (,U,mdp) unchanged? true for each state s in S do if maxaA(s)sP(ss,a)U[s]>sP(ss,[s])U[s] then do [s]aA(s)argmaxsP(ss,a)U[s] unchanged? false until unchanged? return
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
