Question: All code should be turned in when you submit your assignment. The code can only use numpy; you cannot use any other machine learning packages,

 All code should be turned in when you submit your assignment.

All code should be turned in when you submit your assignment. The code can only use numpy; you cannot use any other machine learning packages, like sklearn.

To better visualize random variables and get some intuition for sampling, this question involves some simple simulations, which is a central theme in machine learning. You will also get some experience using julia and pluto notebooks, which you will also need to use in later assignments. Complete the attached notebook A1.jl and follow the instructions.md to get setup. For the first two questions, the goal is to understand how much estimators themselves can vary: how different our estimate would have been under a different randomly sampled dataset. In the real world, we do not get to obtain different estimators, we will only have one; in this controlled setting, though, we can actually simulate how different the estimators could be. For the second two questions, the goal is to understand how we to obtain confidence intervals for our single sample average estimator. (a) [5 MARKS] Fill in the code to calculate the samples mean, variance, and standard deviation of a vector of numbers. Do not use any packages not already loaded! Note that for the remainder of this question you will actually only use the sample mean outputted by your code, and will reason about the variability in this sample mean estimator. However, we get you to implement all three, for a bit of a practice. (b) 17 MARKS] Run the code for 10 samples with u = () and o2 = 1.0. Write down the sample average that you obtain. Now do this another 4 times, giving you 5 estimates of the sample average M1, M2, M3, M4 and Ms. What is the sample variance of these 5 estimates? Use the unbiased sample variance formula, V = n-i 2-1(M; M)2. Note that here we want to understand the variability of the mean estimator itself, if it had been run on different datasets; beautifully we can actually simulate this using synthetic data. (c) 17 MARKS] Now run the same experiment, but use 100 samples for each sample average estimate. What is the sample variance of these 5 estimates? How is it different from the variance when you used 10 samples to compute the estimates? (d) [8 MARKS] Now let us consider a higher variance situation, where o2 = 10.0. Imagine you know this variance, and that the data comes from a Gaussian, but that you do not know the true mean. Run the code to get 30 samples, and compute one sample average M. What is the 95% confidence interval around this M? Give actual numbers. (e) [8 MARKS] Now assume you know less: you do not know the data is Gaussian, though you still know the variance is o2 = 10.0. Use the same 30 samples from (d) and resulting sample average M. Give a 95% confidence interval around M, now without assuming the samples are Gaussian. = To better visualize random variables and get some intuition for sampling, this question involves some simple simulations, which is a central theme in machine learning. You will also get some experience using julia and pluto notebooks, which you will also need to use in later assignments. Complete the attached notebook A1.jl and follow the instructions.md to get setup. For the first two questions, the goal is to understand how much estimators themselves can vary: how different our estimate would have been under a different randomly sampled dataset. In the real world, we do not get to obtain different estimators, we will only have one; in this controlled setting, though, we can actually simulate how different the estimators could be. For the second two questions, the goal is to understand how we to obtain confidence intervals for our single sample average estimator. (a) [5 MARKS] Fill in the code to calculate the samples mean, variance, and standard deviation of a vector of numbers. Do not use any packages not already loaded! Note that for the remainder of this question you will actually only use the sample mean outputted by your code, and will reason about the variability in this sample mean estimator. However, we get you to implement all three, for a bit of a practice. (b) 17 MARKS] Run the code for 10 samples with u = () and o2 = 1.0. Write down the sample average that you obtain. Now do this another 4 times, giving you 5 estimates of the sample average M1, M2, M3, M4 and Ms. What is the sample variance of these 5 estimates? Use the unbiased sample variance formula, V = n-i 2-1(M; M)2. Note that here we want to understand the variability of the mean estimator itself, if it had been run on different datasets; beautifully we can actually simulate this using synthetic data. (c) 17 MARKS] Now run the same experiment, but use 100 samples for each sample average estimate. What is the sample variance of these 5 estimates? How is it different from the variance when you used 10 samples to compute the estimates? (d) [8 MARKS] Now let us consider a higher variance situation, where o2 = 10.0. Imagine you know this variance, and that the data comes from a Gaussian, but that you do not know the true mean. Run the code to get 30 samples, and compute one sample average M. What is the 95% confidence interval around this M? Give actual numbers. (e) [8 MARKS] Now assume you know less: you do not know the data is Gaussian, though you still know the variance is o2 = 10.0. Use the same 30 samples from (d) and resulting sample average M. Give a 95% confidence interval around M, now without assuming the samples are Gaussian. =

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!