Question: write a function that takes a probability distribution sample (an unsorted numpy array of numbers that were sampled from a probability distribution) and calculate the
write a function that takes a probability distribution sample (an unsorted numpy array of numbers that were sampled from a probability distribution) and calculate the high probability density intervals (5%-100% percentile of the probability density). Report all continuous regions as separate intervals. Please test it with the bimodal distribution below and see whether you get two HPD regions instead of just one. The output should be in the form of a list of intervals (list of two values that brace the interval). For example: [[-4.0,0.0],[2.0,6.0]]. The first step is to estimate the probability density which you can get by binning the data into a histogram (here the challenge is to pick an appropriate binning density). Then you have to sort the data according to probability density and then picking the HPD data. After that you need to estimate the continuous intervals of data.
Some Sample Code
In [4]:
%matplotlib inline import numpy as np
In [11]:
gauss_a = stats.norm.rvs(loc=4, scale=0.9, size=3000) gauss_b = stats.norm.rvs(loc=-2, scale=1, size=2000) # mix norm is the input of your function mix_norm = np.concatenate((gauss_a, gauss_b)) HPD = np.percentile(mix_norm,[2.5,97.5]) plt.hist(mix_norm, density=True) plt.plot(HPD,[0,0],label='HPD {:.2f} {:.2f}'.format(*HPD), linewidth=8, color='k') plt.legend(fontsize=10) In [6]:
len(mix_norm)
Out[6]:
5000
In [7]:
#first sort the sample mix_norm.sort()
In [8]:
#hint # create histogram by binning hist, edges = np.histogram(mix_norm, bins=40)
In [9]:
# calculate cumulative sum of hist # the goal is to create a list of lists that contain the elements in the bins cum = np.cumsum(hist) cum = np.insert(cum,0,0) print(hist) print(cum)
In [10]:
# create element list element_list = [mix_norm[c:c+h] for h,c in zip(hist,cum)] element_list
import matplotlib.pyplot as plt from scipy import stats import seaborn as sns
In [ ]:
# now you need to eliminate the bins that are in the bottom 5 percentile
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
