Question: All code should be turned in when you submit your assignment. The code can only use numpy; you cannot use any other machine learning packages,

All code should be turned in when you submit your assignment.

All code should be turned in when you submit your assignment. The code can only use numpy; you cannot use any other machine learning packages, like sklearn.

To better visualize random variables and get some intuition for sampling, this question involves some simple simulations, which is a central theme in machine learning. You will also get some experience using julia and pluto notebooks, which you will also need to use in later assignments. Complete the attached notebook A1.jl and follow the instructions.md to get setup. For the first two questions, the goal is to understand how much estimators themselves can vary: how different our estimate would have been under a different randomly sampled dataset. In the real world, we do not get to obtain different estimators, we will only have one; in this controlled setting, though, we can actually simulate how different the estimators could be. For the second two questions, the goal is to understand how we to obtain confidence intervals for our single sample average estimator. (a) [5 MARKS] Fill in the code to calculate the samples mean, variance, and standard deviation of a vector of numbers. Do not use any packages not already loaded! Note that for the remainder of this question you will actually only use the sample mean outputted by your code, and will reason about the variability in this sample mean estimator. However, we get you to implement all three, for a bit of a practice. (b) 17 MARKS] Run the code for 10 samples with u = () and o2 = 1.0. Write down the sample average that you obtain. Now do this another 4 times, giving you 5 estimates of the sample average M1, M2, M3, M4 and Ms. What is the sample variance of these 5 estimates? Use the unbiased sample variance formula, V = n-i 2-1(M; M)2. Note that here we want to understand the variability of the mean estimator itself, if it had been run on different datasets; beautifully we can actually simulate this using synthetic data. (c) 17 MARKS] Now run the same experiment, but use 100 samples for each sample average estimate. What is the sample variance of these 5 estimates? How is it different from the variance when you used 10 samples to compute the estimates? (d) [8 MARKS] Now let us consider a higher variance situation, where o2 = 10.0. Imagine you know this variance, and that the data comes from a Gaussian, but that you do not know the true mean. Run the code to get 30 samples, and compute one sample average M. What is the 95% confidence interval around this M? Give actual numbers. (e) [8 MARKS] Now assume you know less: you do not know the data is Gaussian, though you still know the variance is o2 = 10.0. Use the same 30 samples from (d) and resulting sample average M. Give a 95% confidence interval around M, now without assuming the samples are Gaussian. = To better visualize random variables and get some intuition for sampling, this question involves some simple simulations, which is a central theme in machine learning. You will also get some experience using julia and pluto notebooks, which you will also need to use in later assignments. Complete the attached notebook A1.jl and follow the instructions.md to get setup. For the first two questions, the goal is to understand how much estimators themselves can vary: how different our estimate would have been under a different randomly sampled dataset. In the real world, we do not get to obtain different estimators, we will only have one; in this controlled setting, though, we can actually simulate how different the estimators could be. For the second two questions, the goal is to understand how we to obtain confidence intervals for our single sample average estimator. (a) [5 MARKS] Fill in the code to calculate the samples mean, variance, and standard deviation of a vector of numbers. Do not use any packages not already loaded! Note that for the remainder of this question you will actually only use the sample mean outputted by your code, and will reason about the variability in this sample mean estimator. However, we get you to implement all three, for a bit of a practice. (b) 17 MARKS] Run the code for 10 samples with u = () and o2 = 1.0. Write down the sample average that you obtain. Now do this another 4 times, giving you 5 estimates of the sample average M1, M2, M3, M4 and Ms. What is the sample variance of these 5 estimates? Use the unbiased sample variance formula, V = n-i 2-1(M; M)2. Note that here we want to understand the variability of the mean estimator itself, if it had been run on different datasets; beautifully we can actually simulate this using synthetic data. (c) 17 MARKS] Now run the same experiment, but use 100 samples for each sample average estimate. What is the sample variance of these 5 estimates? How is it different from the variance when you used 10 samples to compute the estimates? (d) [8 MARKS] Now let us consider a higher variance situation, where o2 = 10.0. Imagine you know this variance, and that the data comes from a Gaussian, but that you do not know the true mean. Run the code to get 30 samples, and compute one sample average M. What is the 95% confidence interval around this M? Give actual numbers. (e) [8 MARKS] Now assume you know less: you do not know the data is Gaussian, though you still know the variance is o2 = 10.0. Use the same 30 samples from (d) and resulting sample average M. Give a 95% confidence interval around M, now without assuming the samples are Gaussian. =

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

All code should be turned in when you submit your assignment. The code can only use numpy; you cannot use any other machine learning packages, like sklearn. = Question 5. [25 MARKS] We have talked...

file: simulate_python2.py from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import numpy as np import random import sys def plot_gaussian(x,y,z,filename=None): """ Plot the...

a logistic regression unit, which is a single neuron neural network. Implement it using Python code in a Jupyter notebook. Use the Log Loss Function and write code for stochastic gradient descent...

Machine Learning - doing neural networks This is all to be written in Python Introduction In Part 1 of this assignment you will implement a basic neural net in numpy. You are not to use any libraries...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Association rule mining is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in...

Solve all parts with code The google colab code/file is : { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression for Red Wine Quality Classification" ] }, {...

Using Python to do this work: For your solution please include screenshots like i did for better understanding. These are instructions: TWITTER AIRLINE SENTIMENT ANALYSIS In class, we studied the...

CAN YOU SOLVE BOTH PARTS WITH ACTUAL CODE IN GOOGLE COLAB USING THE . ipynb file copied and pasted below! { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression for...

Code: Nearest neighbor for handwritten digit recognition In this notebook we will build a classifier that takes an image of a handwritten digit and outputs a label 0-9. We will look at a particularly...

Mikaelabelle Products sells product A at a selling price of $40 per unit. Mikaelabelles cost per unit based on the full capacity of 500,000 units is as follows: Direct materials $ 6 Direct labor 3...

Fabien Ltd is a manufacturing organisation supplying specialised engineered products to a wide range of public and private sector throughout the UK. You are a trainee in the finance office recently...

Factor Markets ( Capital ) A tech startup is considering investing in a new software development project. The initial investment for the project is $ 5 million. The company estimates that the annual...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

a. How will the leader be selected?

4. Trust develops with shared experience, values, give-and-take, and the result of human communication. Satellites, electronic mail, and networks could reduce the dimensions of trust to which we are...

1. Opportunities for face-to-face contact will be diminished, and information from nonverbal cues will be reduced. Consequently, opportunities for random spontaneous information sharing will be...