Question: useful codedef create_mlp(input_dim: int, output_dim: int, architecture: List[int], squash=False, activation: Type[nn.Module]=nn.ReLU) -> List[nn.Module]: '''Creates a list of modules that define an MLP.''' if len(architecture) >

useful codedef create_mlp(input_dim: int, output_dim: int, architecture: List[int], squash=False, activation: Type[nn.Module]=nn.ReLU)

useful codedef create_mlp(input_dim: int, output_dim: int, architecture: List[int], squash=False, activation: Type[nn.Module]=nn.ReLU) -> List[nn.Module]: '''Creates a list of modules that define an MLP.''' if len(architecture) > 0: layers = [nn.Linear(input_dim, architecture[0]), activation()] else: layers = [] for i in range(len(architecture) - 1): layers.append(nn.Linear(architecture[i], architecture[i+1])) layers.append(activation()) if output_dim > 0: last_dim = architecture[-1] if len(architecture) > 0 else input_dim layers.append(nn.Linear(last_dim, output_dim)) if squash: # squashes output down to (-1, 1) layers.append(nn.Tanh()) return layers

def create_net(input_dim: int, output_dim: int, squash=False): layers = create_mlp(input_dim, output_dim, architecture=[64, 64], squash=squash) net = nn.Sequential(*layers) return net

def argmax_policy(net): # TODO: Return a FUNCTION that takes in a state, and outputs the maximum Q value of said state. # Inputs: # - net: (type nn.Module). A neural network module, going from state dimension to number of actions. Q network. # Wanted output: # - argmax_fn: A function which takes in a state, and outputs the maximum Q value of said state. pass

def expert_policy(expert, s): '''Returns a one-hot encoded action of what the expert predicts at state s.''' action = expert.predict(s)[0] one_hot_action = np.eye(4)[action] return one_hot_action

We first ask that you implement some simple utilities that will go toward training all of our policies. Because LunarLander-v2 is an environment with a finite number of actions, we can represent our policy by a neural network that takes the state in and outputs a vector with dimension being the number of actions. In imitation learning, we want to be able to match the expert actions at all states. In the discrete action case, this boils down to maximizing the log likelihood of the taken expert actions in those particular states (i.e. by maximizing logits). But in evaluation/deployment, we want to use a greedy version of our learnt policy-we want to be able to exploit what we have learned throughout the training process by choosing the action that the learner thinks is best (i.e. maximum logits). Please implement the argmax policy method in Part 1: Utils section of the notebook. Follow the instructions present in the method description to help write this method. det argmax policy (net): * TOo0: Return s rUACrIok that takes in a state, and eutputs the maximum 0 value of said otate. e. Inpats 1 * - net. (type nn. Module). A neural network inodula, going from state dimeasion to number of actiona. Q network. * Wanted outpat: * - argmax fni a function which takee in a ntate, and cutpute the maximam 0 value of said state. pasan Behavioral cloning: Behavioral cloning is the simplest imitation learning algorithm, where we perform supervised learning on the given (offline) expert dataset. We either do this via log likelihood maximization (cross entropy minimization) in the discrete action case, or mean-squared error minimization (can also do MLE) in the continuous control setting. Please implement the following loarn () function for BC. def learn(selt, env, ataten, actiona, n_gtepa=1e4, traneaterprue)t f Tobo: Implecent this nethed. keturn the Ilnal gteedy poliey (argnax poltey)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Code additional methods to implement into this neural network python code to fix the imbalanced data set by appling SMOTE or bias to minority classes and batch normalization...

Code additional methods to implement into this neural network python code to calculate and display the: Kappa (or Cohens kappa): Classification accuracy normalized by the imbalance of the classes in...

Suppose that this years money supply is $500 billion, nominal GDP is $10 trillion, and real GDP is $5 trillion. a. What is the price level? What is the velocity of money? b. Suppose that velocity is...

Jensen, Inc., filed suit against a public accounting firm, alleging that the auditors' negligence was responsible for failure to disclose a large defalcation that had been in process for several...

5. Do you have a dark seducer or seductress figure in your script? How can you evoke the primal forces of the incubus or succubus within these figures, to make them seem more powerful and...

Harry, a friend of yours, is taking a course in economics, and has become confused by some of the terminology because of the way people commonly use the same words. The economics professor says...

A $135,000 B $124,200 9) Swan Textiles Inc. produces and sells a decorative pillow for $98.00 per unit. In the first month of operation, 2,300 units were produced and 1,800 units were sold. Actual...

Find the energy (in joules) of the photon that is emitted when the electron in a hydrogen atom undergoes a transition from the n = 7 energy level to produce a line in the Paschen series.

Rydberg molecules are molecules with an electron in an atomic orbital with principal quantum number 11 one higher than the valence shells of the constituent atoms. Speculate about the existence of...

Should I choose "industry in which the client operates", "audit partner in charge", "control environment", or "risk environment"? The v also sets the foundation for effective internal control, and...

what are the factors that shape your organization external competitiveness to recruit and retain employees

a lack of participation in decision making and initiative by employees;

too much short-term, situational thinking, and a lack of long-term planning;

Exposure to market forces has focused attention on the cost of human resources as a high proportion of operating costs.