Module Four Discussion Hypothesis Testing for the Difference in Two Population Proportions This notebook contains the step by step directions for your Module Four discussion It is very important to run through the steps in order Some steps depend on the outputs of earlier steps Once you have completed the steps in this notebook, be sure to answer the questions about this activity in the discussion for this module Reminder If you have not already reviewed the discussion prompt, please do so before beginning this activity That will give you an idea of the questions you will need to answer with the outputs of this script Initial post ( due Thursday ) Step 1 Generating sample data This block of Python code will generate two samples, both of size 5 0 , that you will use in this discussion The datasets will be unique to you and therefore your answers will be unique as well The numpy module in Python allows you to create a data set using a Normal distribution The data sets will be saved in Python dataframes and will be used in later calculations Click the block of code below and hit the Run button above import pandas as pd import numpy as np create 5 0 randomly chosen values from a normal distribution ( arbitrarily using mean 2 4 8 and standard deviation 0 5 0 0 ) diameters sample 1 np random normal ( 2 4 8 , 0 5 0 0 , 5 0 ) convert the array into a dataframe with the column name diameters using pandas library diameters sample 1 df pd DataFrame ( diameters sample 1 , columns ' diameters ' ) diameters sample 1 df diameters sample 1 df round ( 2 ) create 5 0 randomly chosen values from a normal distribution ( arbitrarily using mean 2 5 0 and standard deviation 0 7 5 0 ) diameters sample 2 np random normal ( 2 5 0 , 0 7 5 0 , 5 0 ) convert the array into a dataframe with the column name diameters using pandas library diameters sample 2 df pd DataFrame ( diameters sample 2 , columns ' diameters ' ) diameters sample 2 df diameters sample 2 df round ( 2 ) print the dataframe to see the first 5 observations ( note that the index of dataframe starts at 0 ) print ( Diameters data frame of the first sample ( showing only the first five observations ) ) print ( diameters sample 1 df head ( ) ) print ( ) print ( Diameters data frame of the second sample ( showing only the first five observations ) ) print ( diameters sample 2 df head ( ) ) Diameters data frame of the first sample ( showing only the first five observations ) diameters 0 2 8 6 1 2 0 6 2 2 5 1 3 2 5 4 4 3 5 5 Diameters data frame of the second sample ( showing only the first five observations ) diameters 0 0 9 2 1 1 4 9 2 3 8 9 3 4 0 9 4 1 3 6 Step 2 Performing hypothesis test for the difference in population proportions The z test for proportions can be used to test for the difference in proportions The proportions ztest method in statsmodels stats proportion submodule runs this test The input to this method is a list of counts meeting a certain condition ( given in the problem statement ) and a list of sample sizes for the two samples Counts Python list that is assigned the number of observations in each sample with diameter values less than 2 2 0 n Python list that is assigned the total number of observations in each sample Click the block of code below and hit the Run button above from statsmodels stats proportion import proportions ztest number of observations in the first sample with diameter values less than 2 2 0 count 1 len ( diameters sample 1 df diameters sample 1 df ' diameters ' 2 2 0 ) number of observations in the second sample with diameter values less than 2 2 0 count 2 len ( diameters sample 2 df diameters sample 2 df ' diameters ' 2 2 0 ) counts Python list counts count 1 , count 2 number of observations in the first sample n 1 len ( diameters sample 1 df ) number of observations in the second sample n 2 len ( diameters sample 2 df ) n Python list n n 1 , n 2 perform the hypothesis test output is a Python tuple that contains test statistic and the two sided P value test statistic, p value proportions ztest ( counts , n ) print ( test statistic , round ( test statistic, 2 ) ) print ( two tailed p value , round ( p value, 4 ) ) test statistic 1 0 6 two tailed p value 0 2 8 7 6

The Answer is in the image, click to view ...

Question: Module Four Discussion: Hypothesis Testing for the Difference in Two Population Proportions This notebook contains the step - by - step directions for your Module

Module Four Discussion: Hypothesis Testing for the Difference in Two Population Proportions

This notebook contains the step

-

-

step directions for your Module Four discussion. It is very important to run through the steps in order. Some steps depend on the outputs of earlier steps. Once you have completed the steps in this notebook, be sure to answer the questions about this activity in the discussion for this module.

Reminder: If you have not already reviewed the discussion prompt, please do so before beginning this activity. That will give you an idea of the questions you will need to answer with the outputs of this script.

Initial post

(

due Thursday

)

Step

1

: Generating sample data

This block of Python code will generate two samples, both of size

50,

that you will use in this discussion. The datasets will be unique to you and therefore your answers will be unique as well. The numpy module in Python allows you to create a data set using a Normal distribution. The data sets will be saved in Python dataframes and will be used in later calculations.

Click the block of code below and hit the Run button above.

import pandas as pd

import numpy as np

# create

50

randomly chosen values from a normal distribution.

(

arbitrarily using mean

= 2.48

and standard deviation

= 0.500)

diameters

_

sample

1 =

.

random.normal

(2.48, 0.500, 50)

# convert the array into a dataframe with the column name "diameters" using pandas library

diameters

_

sample

1_

=

.

DataFrame

(

diameters

_

sample

1,

columns

= ['

diameters

'])

diameters

_

sample

1_

=

diameters

_

sample

1_

.

round

(2)

# create

50

randomly chosen values from a normal distribution.

(

arbitrarily using mean

= 2.50

and standard deviation

= 0.750)

diameters

_

sample

2 =

.

random.normal

(2.50, 0.750, 50)

# convert the array into a dataframe with the column name "diameters" using pandas library

diameters

_

sample

2_

=

.

DataFrame

(

diameters

_

sample

2,

columns

= ['

diameters

'])

diameters

_

sample

2_

=

diameters

_

sample

2_

.

round

(2)

# print the dataframe to see the first

5

observations

(

note that the index of dataframe starts at

0)

("

Diameters data frame of the first sample

(

showing only the first five observations

) ")

(

diameters

_

sample

1_

.

head

())

()

("

Diameters data frame of the second sample

(

showing only the first five observations

) ")

(

diameters

_

sample

2_

.

head

())

Diameters data frame of the first sample

(

showing only the first five observations

)

diameters

0 2.86

1 2.06

2 2.51

3 2.54

4 3.55

Diameters data frame of the second sample

(

showing only the first five observations

)

diameters

0 0.92

1 1.49

2 3.89

3 4.09

4 1.36

Step

2

: Performing hypothesis test for the difference in population proportions

The z

-

test for proportions can be used to test for the difference in proportions. The proportions

_

ztest method in statsmodels.stats.proportion submodule runs this test. The input to this method is a list of counts meeting a certain condition

(

given in the problem statement

)

and a list of sample sizes for the two samples.

Counts Python list that is assigned the number of observations in each sample with diameter values less than

2.20 .

n Python list that is assigned the total number of observations in each sample.

Click the block of code below and hit the Run button above.

from statsmodels.stats.proportion import proportions

_

ztest

# number of observations in the first sample with diameter values less than

2.20 .

count

1 =

len

(

diameters

_

sample

1_

[

diameters

_

sample

1_

['

diameters

'] < 2.20])

# number of observations in the second sample with diameter values less than

2.20 .

count

2 =

len

(

diameters

_

sample

2_

[

diameters

_

sample

2_

['

diameters

'] < 2.20])

# counts Python list

counts

= [

count

1,

count

2]

# number of observations in the first sample

1 =

len

(

diameters

_

sample

1_

)

# number of observations in the second sample

2 =

len

(

diameters

_

sample

2_

)

# n Python list

= [

1,

2]

# perform the hypothesis test. output is a Python tuple that contains test

_

statistic and the two

-

sided P

_

value.

test

_

statistic, p

_

value

=

proportions

_

ztest

(

counts

,

)

("

test

-

statistic

= ",

round

(

test

_

statistic,

2))

("

two tailed p

-

value

= ",

round

(

_

value,

4))

test

-

statistic

= - 1.06

two tailed p

-

value

= 0.2876

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Please use the attached screen shot to address my assignment. In this discussion, you will apply the statistical concepts and techniques covered in this week's reading about hypothesis testing for...

In this discussion, you will apply the statistical concepts and techniques covered in this week's reading about hypothesis testing for the difference between two population proportions. In the...

Diameters data frame of the first sample (showing only the first five observations) diameters 0 1.99 1 1.56 2 1.89 3 2.74 4 1.81 Diameters data frame of the second sample (showing only the first five...

Apply the statistical concepts and techniques covered in this week's reading about hypothesis testing for the difference between two population proportions. In the previous week's discussion, you...

I need some help on a stat assignment from SNHU MAT-243. Hypothesis Testing for the Difference Between Two Population Parameters. The prompt says: Suppose that the factory claims that the proportion...

Not sure of the question: Can you assist me? Use the link in the Jupyter Notebook activity to access your Python script. Once you have made your calculations, complete this discussion. The script...

Module Four Discussion: F-Test for Comparing Nested Models Impaortant: You will not be doing a problem set in this module so as to allow you more time to work on Project One. Instead, you will have...

Hello, I am a bit stuck on my assignment this week. I believe I have figured out steps 1-3. I am a bit stuck on 4-6. Any help would be appreciated. " This notebook contains the step-by-step...

Center for Learning and Technology COURSE SYLLABUS PRINCIPLES OF STATISTICS STA-201-GS Course Syllabus PRINCIPLES OF STATISTICS STA-201-GS Thomas Edison State College May 2015 Course Essentials...

FEMALE AGE HT 295 17 64.3 2739 32 66.4 2992 25 62.3 3745 55 62.3 4486 27 59.6 4488 29 63.6 4878 25 59.8 4880 12 63.3 4881 41 67.9 4835 32 61.4 4842 31 66.7 6225 19 64.8 8680 19 63.1 8681 23 66.7...

What is a convertible bond? Is a convertible bond more or less attractive to a bond holder than a nonconvertible bond?

The quality manager for a major automobile manufacturer is interested in estimating the mean number of paint defects in cars produced by the company. She wishes to have her estimate be within 0.10...

PL Equipment has 8 0 , 0 0 0 bonds outstanding that are selling at par. Bonds with similar characteristics are yielding 6 . 2 5 percent. The company also has 7 5 0 , 0 0 0 shares of 5 . 5 percent...

Intel Corporation Adjusted Trial Balance as at 31 December 2025 Debit ($) Credit ($) Cash 40,000 Accounts Receivable 50,000 Prepaid Insurance 20,000 Equipment 400,000 Accumulated Depreciation -...