Question: reward = function ( policy , MDP ) { rew = 0 state = 1 for ( i in 1 : 1 0 ) {

reward = function(policy, MDP){
rew =0
state =1
for(i in 1:10){
if(state ==9){
break
}
# MDP_AB, MDP_AC, MDP_BC, MDP_BA, MDP_CA, MDP_CB
p_mdp=c(1/3,1/4,1/12,1/6,1/12,1/12)
# Here are the assignment MDPs
assignments=list(MDP_AB, MDP_AC, MDP_BC, MDP_BA, MDP_CA, MDP_CB)
# Here are the labels for each assignment
assigns=c('MDP_AB','MDP_AC','MDP_BC','MDP_BA','MDP_CA','MDP_CB')
# For each assignment's best policy
for(pi in 1:length(assignments)){
# Find the optimal policy
pol = # YOUR CODE HERE
# Create a variable to store the expected reward
er =0
# For each assignment
for(mdp in 1:length(assignments)){
# Calculate the reward R(pi, MDP)
r = YOUR CODE HERE
# Update the expected rewards E[R(pi, MDP)]
er = # YOUR CODE HERE
}
message(assigns[pi],'', er)
} Complete the three code snippets (your answer here)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!