Question: Before applying any algorithm to a real - world problem, it is common practice to first understand its behavior on synthetic data. Let s generate

Before applying any algorithm to a real

-

world problem, it is common practice to first understand its behavior on synthetic data. Let

s generate a synthetic data set for a regression problem: given input

-

output pairs, we aim to learn a function that maps inputs to outputs.

Suppose we have input

and output

that are related by the equation

.

However, in practice, we can only observe noisy data. That is

,

given input

,

we observe output

that is related to

by the equation

,

where

is a random noise term. A common model for

is a Gaussian random variable with mean

0

and variance

.

Suppose

.

Generate

10

data points for

uniformly randomly distributed in the range of

.

Make a plot that contain the following elements:

(1)

The ground truth function.

(2)

The noisy data points.

(3)

Add a legend to the plot, where the ground truth is labeled as

Ground truth

and the noisy data points are labeled as

Noisy data

.

Label the x

-

axis as

and the y

-

axis as

.

A sample figure is shown below.

# code here

. . /_

images

/

3011438376

90858

866272

856

29

463

2

dbf

1

86

4999129

739

fae

47 .

png

2

We usually denote a normal distribution with mean

and variance

.

Let

s generate a synthetic data set for a classification problem. Let X be a random variable that follows a normal distribution

and Y be a random variable that follows a normal distribution

.

X and Y can be some feature of two groups. For example, the height of high school students and the height of college students. Different groups can have different distributions of the same feature.

Generate

1000

samples for X and

500

samples for Y

.

This models the scenario where we have more data for one group than the other.

(1)

Plot the histograms of X and Y in the same figure.

(2)

Add label

X samples

to the histogram of X and label

Y samples

to the histogram of Y

.

Add a legend to the plot.

(3)

The two histograms should have different color and set the transparency to

0.5

so that we can see the overlap of the two histograms.

Hint: the transparency is usually named as alpha in most plotting libraries.

A sample figure is shown below.

# code here

. . /_

images

/ 54

9

5

95

5

97

323

3

204399

24

054405

09

9

5

42

31

9

4

1

fabef.png

3 .

Let

s bring our data science skill to the Wall Street. One model of stock price is the random walk model:

Suppose

is the stock price at day

.

is the initial stock price. At each day, the change of stock price is a random variable

,

which is normally distributed with mean

and variance

.

The stock price at day

.

(1)

Write a function stock

_

price

_

simulation, that take input

0

: the initial stock price

mu: the mean of the normal distribution

sigma: the standard deviation of the normal distribution

n: the number of days

Return a list

(

or numpy array

)

of stock prices at each day

.

# code here

(2)

Take

.

Sample

10

trajectories of the stock price and plot them in the same graph.

A sample figure is shown below.

# code here

. . /_

images

/ 4

3532783

6

bbe

597

4

1585510

55

09

4

61867

938093

bafac

287

afb

5

9 .

png

(3)

Estimate the expectation and standard deviation of the stock price on day

100,

using

1000

samples.

# code here

(4) (

Challenge

,

not graded

)

A call option is a contract that allows you to buy a stock at a fixed price at a future date. Suppose you own a call option that allows you to buy a stock at day

100

at price

105 (

this is called the strike price

) .

If the stock price at day

100

is above

105 .

Then you can exercise the option, pay

105

to get the stock, and sell it at the market price to make a profit. Otherwise, you don

t exercise the option and don

t make a profit.

Estimate the probability that you can make a profit using the call option. Suppose you

re the seller or the buyer of this call option. Estimate what should be the fair price of the call option.

# code here

4

One model of wealth inequality is the pareto distribution.

Let

s generate N

= 1000

samples from a pareto distribution with parameters a

= 20

using the following code:

import numpy as np

= 1000

= 20

=

.

random.pareto

(

,

)

You can think of x as samples of wealth of a population.

(1)

Plot the histogram of the samples.

# code here

(2)

The k

-

quantile of a distribution is the value such that k

%

of the samples are less than or equal to the value. For example, the

50 -

quantile is the median.

What are the median and the mean of the samples? What is the percentage of the population that are above

-

average wealthy?

# code here

(3)

Estimate what percentage of the population owns more than

80 %

of the wealth?

Hint: you can sort the array such that

,

and compute the cumulative sum of the array:

.

Then

is the total wealth of the top i people

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Please help me with this assignment, 100% human! Reference book George, J. M. (2024). Contemporary management (12th ed.). McGraw-Hill Education. keiser library Syahbinah, S., & Suhardianto, N....

IfyouhaveplayedaSimulationcalledProBankerIneedhelpansweringthesequestionsassoonaspossible from the pro bankerassignment attachment..please use spreadsheet and players manual for reference. Need...

Attached is a case study and format that needs to be conducted. I'm looking for someone to set up the format and give the solution for each section. Just a few lines for each portion to get the case...

For each of the 10 sorting applications below, provide a link to an authoritative website that illustrates a strong example of the sorting application and the reason you picked that website in...

1. Evaluate the often tried approach of adopting past solutions to similar problems encountered today? Is this a form of linear thinking or not? 2. Why is linear thinking dangerous? 3. Whether the...

Educating Managers from an Evidence-Based Perspective Author(s): Denise M. Rousseau and Sharon Mccarthy Source: Academy of Management Learning & Education, Vol. 6, No. 1 (Mar., 2007), pp. 84101...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

Please read the question Question: On pages 5 and 6 the authors write, "Narrative provides not only meaning but also a mental framework for imbuing future experiences and information with meaning, in...

Debt investments held-for-collection and selling dos not follow the same accounting entries as debt investments held-for-collection during the reporting period True False

The Museum of America is preparing for its annual appreciation dinner for contributing members. Last year, 525 members attended the dinner. Tickets for the dinner were $24 per attendee. The profit...

Pregunta 1 5 5 ( 1 punto ) La f rmula Karvonen se usa para calcular una de las siguientes opciones. Cu l ? a ) Frecuencia card aca objetivo b ) Frecuencia card aca aproximada c ) Frecuencia card aca...

Jill hires Jack to work part-time in her cafe. Does a principal-agency relationship exists between Jack and Jill?-No. jack is an independent contractor - Yes this is an example of a gratuitous bailmen