Question: For Question 1e, what code shall I write to create a histogram like this: Thank you so much! 23]: def integer_distribution(arr_ints): plt.hist(faces, bins=unit_bins, ec=' white',

For Question 1e, what code shall I write to create a histogram like this: Thank you so much! 23]: def integer_distribution(arr_ints): plt.hist(faces, bins=unit_bins, ec=' white', density=True, alpha=0.7) faces = range (1,10) integer_distribution(faces) unit_bins = np.arange (0.5,9.6) 1.3.3 Tutorial: Birthweights Recall from statistics that you can perform hypothesis testing using the permutation test. In a particular medical study, a sample

For Question 1e, what code shall I write to create a histogram like this:

of newborn babies was obtained from a large hospital system. We will Thank you so much!

23]: def integer_distribution(arr_ints): plt.hist(faces, bins=unit_bins, ec=' white', density=True, alpha=0.7) faces = range (1,10) integer_distribution(faces) unit_bins = np.arange (0.5,9.6) 1.3.3 Tutorial: Birthweights Recall from statistics that you can perform hypothesis testing using the permutation test. In a particular medical study, a sample of newborn babies was obtained from a large hospital system. We will treat the data as if it were a simple random sample, though the sampling was done in multiple stages. One of the aims of the study was to see whether maternal smoking was associated with birth weight. R.un the below cell, which assigns nonsmoker_births to a list of birth weights of the babies of the non-smoking mothers (of which there are 715), and smoker_births to a list of birth weights of the babies of the smoking mothers (of which ther are 459 ). import_pandas as pd births_df = pd.read_csv("births.csv"') nonsmoker_births = births_df [ births_df['Maternal Smoker']]['Birth Weight']. ivalues print(len(nonsmoker_births)) smoker_births = births_df [births_df['Maternal_Smoker']]['Birth Weight'].values print(len(smoker_births)) 715 459 Suppose that we overlay the distributions of birthweights from the two groups of mothers: 25]: \# Just run this cell. \# You uill learn these functions soon! import seaborn as sns sns.histplot(births_df, x= "Birth Weight", hue="Maternal Smoker"); plt.title("Distribution of Birth Weight"); The distribution of the weights of the babies born to mothers who smoked appears to be based slightly to the left of the distribution corresponding to non-smoking mothers. The weights of the babies of the mothers who smoked seem lower on average than the weights of the babies of the non-smokers. We'd like to study whether this difference reflects just chance variation or perhaps a difference in the distributions in the larger population. Suppose we propose the following two hypotheses: Null hypothesis ( H0 ): In the population, the distribution of birth weights of babies is the same for mothers who don't smoke as for mothers who do. The (observed) difference in the sample is due to chance. Alternative hypothesis (H1): In the population, the babies of the mothers who smoke have a different birth weight, on average, than the babies of the non-smokers. We would like to perform hypothesis testing using the permutation test. One way to do so is to compute an observed test statistic and then compare it with multiple simulated test statistics, generated through random permutations. 1.3.4 Question 1b Suppose that we choose a reasonable test statistic as the absolute difference between the average birth weights of the two groups (i.e., the babies of non-smokers and the babies of mothers who smoke). In the below cell, assign observed_difference to the observed test statistic given our original sample of nonsmoker_births and smoker_births. 6] : observed_difference =np. average (nonsmoker_births) np.average(smoker_births) observed_difference 6]:9.266142572024918 1.3.5 Question 1c Before we write any code, let's review the idea of hypothesis testing with the permutation test. We first simulate the experiment many times (say, 10,000 times) through random permutation (i.e. without replacement). Assuming that the null hypothesis holds, this process will produce an empirical distribution of a predetermined test statistic. Then, we use this empirical distribution to compute an empirical p-value, which is then compared against a particular cutoff threshold in order to accept or reject our mull hypothesis. In the below cell, answer the following questions: * What does an empirical p-value from a permutation test mean in this particular context of birthweights and maternal smoking habits? * Suppose the resulting empirical p-value p0.01, where 0.01 is our p-value cutoff threshold. Do we accept or reject the null hypothesis? Why? The empirical p-value is the p-value calculated from the observed data, which is the data in the file birth.csv. It is the probability while holding the condition of H0, which the distribution of birth weights of babies is the same for mothers who don't smoke as for mothers who do is true. Reject the null hypothesis. Because when p is less than or equal to 0.01, it means that there is less than 1% probability the null is correct. Hence, we reject the null and accept the alternative hypothesis. 1.3.6 Question 1d Now, we begin the permutation test by generating an array called differences that contains simulated values of our test statistic from 10,000 permuted samples. Again, we are computing the absolute difference between the mean weight in the smoking group and the mean weight in the non-smoking group, where labels have been assigned at random. Reminder: Some suggested references: NumPy array indexing/slicing documentation, np. random. choice documentation, np.append documentation. Note: We have provided some optional skeleton code below, but you do not need to follow it. Please assign simulated differences to the array differences. [27]: \# Create an empty array to hold our simulated differences differences = np array([]) \# Set number of repetitions repetitions =10000 * Combine the two arrays into a single array all_birthweights =np. append (nonsmoker_births, smoker_births) for i in np.arange (repetitions): \# Permute ali_birthweights shuffled_birthweights =np.random.permutation(all_birthweights) \# Make the simulated smoker and non-smoker group sim_nonsmoker_births = shuffled_birthweights [:len (nonsmoker_births) ] sim_smoker_births = shuffled_birthweights [len (nonsmoker_births):] \# Calculate test statistics sim_difference =np. average (sim_nonsmoker_births )np. average (sim_smoker_births) \# Save the test statistics in differences differences = observed_difference - sim_difference differences 1.3.7 Question 1e The array differences is an empirical distribution of the test statistic simulated under the null hypothesis. This is a prediction about the test statistic, based on the null hypothesis. Use the integer_distribution function you defined in an earlier part to plot a histogram of this empirical distribution. Because you are using this function, your histogram should have unit bins, with bars centered at integers. No title or labels are required for this question. Hint: This part should be very straightforward. integer_distribution (differences) 1.3.8 Question 1f Compute empirical_p, the empirical p-value based on differences, the empirical distribution of the test statistic, and observed_difference, the observed value of the test statistic. 19] : empirical_p = np.count_nonzero(differences

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

ONLY QUESTION 1A Task 0.A (2 points) In a code chunk, load the wooldridge, lmtest, sandwich, and AER packages. If you have not yet installed all of them, then do so. Remember, you never ever use...

ONLY TASK 1.A Task 0.A (2 points) In a code chunk, load the wooldridge, lmtest, sandwich, and AER packages. If you have not yet installed all of them, then do so. Remember, you never ever use...

Task 1 Simulation A. Use np.random.randint to simulate age data for 100 adults, whose age range from 18 to 36. Then, calculate basic descriptive statistics (mean, median, standard deviation) for your...

Java Code-NEED ALL DONE TOGETHER- Everything you need is below with question and Base Code. Need done ASAP 0) Make a directory called comp1406t3. Download all the tutorial 3 fles to this directly. 1)...

I have uploaded my answer here, but i wonder what I am doing wrong. Would you kindly look at my code here and explain what I'm doing is wrong? I assume it got something to do with how i defined my...

Please assist Question 1(c) If the random varaible X represents all 24 possible values of a key and P( X ) represents the probability that each of those values is the correct key to unlock a vault,...

Question 1(a) Bis not yet a probability distribution. Why not? What do we need to do to Bto make it a probability distribution? (Hint: Use the comments in the code chunk below to guide you) (1 mark)...

Using python Write a function that matches a string that has an 'D' followed by zero or more a's def probleml_a (string): Function that matches a string containing a 'D' with zero or more 'a's after...

psuccess . . . Question 1 (d) Let us asssume now that the attacker can make k attempts before the vault gets blocked and alarm sounded. Use the functions psuccess (k, p) to compute his success...

By calculating the standard emf, decide whether aqueous sodium hypochlorite solution will oxidize Fe2+ (aq) to Fe3+ (aq) in basic solution under standard conditions. See Appendix I for data.

You find the following corporate bond quotes. To calculate the number of years until maturity, assume that it is currently January 15, 2019. The bonds have a par value of $2,000 and semiannual...

Hii teacher please provide for General accounting question answer do fast A company, Global Corp., has a beta of 1.5. The U.S. stock market is expected to generate an annual return of 10 percent....

Question: An investor is planning to invest in the bond market and has the following choices: Bond A: This is a coupon bond from ABM Ltd. The bond has a face value of $1,000 and a coupon rate of 5%...

What are the Variable columns settings available in the Mining Models Tab?

What does the Mining Content Viewer in Visual Studio show in terms of Probabilities?

How are continuous variables normally handled in Decision Tree Algorithms?