Question: Before applying any algorithm to a real - world problem, it is common practice to first understand its behavior on synthetic data. Let s generate
Before applying any algorithm to a realworld problem, it is common practice to first understand its behavior on synthetic data. Lets generate a synthetic data set for a regression problem: given inputoutput pairs, we aim to learn a function that maps inputs to outputs.
Suppose we have input
and output
that are related by the equation
However, in practice, we can only observe noisy data. That is given input
we observe output
that is related to
by the equation
where
is a random noise term. A common model for
is a Gaussian random variable with mean and variance
Suppose
Generate data points for
uniformly randomly distributed in the range of
Make a plot that contain the following elements:
The ground truth function.
The noisy data points.
Add a legend to the plot, where the ground truth is labeled as Ground truth and the noisy data points are labeled as Noisy data Label the xaxis as x and the yaxis as y
A sample figure is shown below.
# code here
imagesacffadebdbfefefafaepng
Q
We usually denote a normal distribution with mean
and variance
as
Lets generate a synthetic data set for a classification problem. Let X be a random variable that follows a normal distribution
and Y be a random variable that follows a normal distribution
X and Y can be some feature of two groups. For example, the height of high school students and the height of college students. Different groups can have different distributions of the same feature.
Generate samples for X and samples for Y This models the scenario where we have more data for one group than the other.
Plot the histograms of X and Y in the same figure.
Add label X samples to the histogram of X and label Y samples to the histogram of Y Add a legend to the plot.
The two histograms should have different color and set the transparency to so that we can see the overlap of the two histograms.
Hint: the transparency is usually named as alpha in most plotting libraries.
A sample figure is shown below.
# code here
imagesdcedbaeecdddaadebdebafabef.png
Q Lets bring our data science skill to the Wall Street. One model of stock price is the random walk model:
Suppose
is the stock price at day
is the initial stock price. At each day, the change of stock price is a random variable
which is normally distributed with mean
and variance
The stock price at day
is
Write a function stockpricesimulation, that take input
X: the initial stock price
mu: the mean of the normal distribution
sigma: the standard deviation of the normal distribution
n: the number of days
Return a list or numpy array of stock prices at each day
# code here
Take
Sample trajectories of the stock price and plot them in the same graph.
A sample figure is shown below.
# code here
imagesdfbbecbcaccaeabafacafbdpng
Estimate the expectation and standard deviation of the stock price on day using samples.
# code here
Challenge not graded A call option is a contract that allows you to buy a stock at a fixed price at a future date. Suppose you own a call option that allows you to buy a stock at day at price this is called the strike price
If the stock price at day is above Then you can exercise the option, pay to get the stock, and sell it at the market price to make a profit. Otherwise, you dont exercise the option and dont make a profit.
Estimate the probability that you can make a profit using the call option. Suppose youre the seller or the buyer of this call option. Estimate what should be the fair price of the call option.
# code here
Q
One model of wealth inequality is the pareto distribution.
Lets generate N samples from a pareto distribution with parameters a using the following code:
import numpy as np
N
a
x nprandom.paretoa N
You can think of x as samples of wealth of a population.
Plot the histogram of the samples.
# code here
The kquantile of a distribution is the value such that k of the samples are less than or equal to the value. For example, the quantile is the median.
What are the median and the mean of the samples? What is the percentage of the population that are aboveaverage wealthy?
# code here
Estimate what percentage of the population owns more than of the wealth?
Hint: you can sort the array such that
and compute the cumulative sum of the array:
Then
is the total wealth of the top i people
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
