Question: In this lab, you will analyze data to learn about the distribution of different income groups in Edmonton and Calgary. You will display and summarize
In this lab, you will analyze data to learn about the distribution of different income groups in Edmonton and Calgary. You will display and summarize categorical variables and explore the relationship between them with contingency tables. The significance of the bivariate relationships will also be assessed. Tests and confidence intervals for proportions will be used to compare the distribution of people in each income group. Various Income Groups in Edmonton and Calgary The distribution of income groups is one of the important economical factors that a capitalist, an economist, and the government want to know. For example, the distribution of income groups is important for a capitalist to determine what type of business is more suitable for investment in a certain city, a certain province, and/or a certain country. While the economist and the government want to know whether the distribution of the income groups is at the right level to determine if any policy will need to be in place. Therefore, from time to time, the goverment will conduct a census and gather this information from their citizens. A census is expensive to conduct each year, however, so Statistics Canada usually completes one every 5 years, the latest conducted in the year 2021. Since the data for this assignment was collected in January 2020 to verify the effect of the COVID pandemic, the information for census 2016 is used in this assignment for comparisons. The information about households for the 2016 census can be obtained from this Statistics Canada The website offers census data for the year 2015 (since the census was conducicu in 2016). According to the census data provided by Statistics Canada, they have divided the income groups into 19 different income groups, but suppose a researcher is only interested in observing the distribution of 3 income groups (group 1: households with income under $50,000; group 2: households with income between $50,000 and $99,999; and group 3: households with income $100,000 and over). Based on census 2016, the distributions of household income for 3 income groups for Edmonton and Calgary are given in the following table: 1 Household Income Between $50,000 and $99,999 Edmonton 115100 (22.92%) | 151565 (30.18%) City Calgary Under $50,000 109375 (21.05%) | 151590 (29.17%) $100,000 and over Total 235485 (46.90%) 502150 (100%) 258720 (49.78%) 519685 (100%) The researcher randomly selects 150 households in Edmonton and another 150 households in Calgary, calling and asking each household whether they belong to income group 1 (low: under $50,000 in total household income), income group 2 (medium: with total household income from $50,000 to $99,999), or income group 3 (high: with total household income at least $100,000). The dataset (Lab3-Data.txt and/or Lab3-Data.csv) relates to this study. The dataset is available in the Data link located in the Lab 3 tab display in the Labs section on eClass. The data are not to be printed in your submission. The following is a description of the variables in the data file: Column L 2 3 4 Variable Name Household City Income Level Description of Variable The household number Name of city where the household is located Income group to which household belongs Income level (Low, Medium, and High) 2. Use data for Edmonton to answer the following questions. (a) Create frequency tables (showing both frequency and percentages) to summarize the distribution of income groups for households in Edmonton. Paste the table into your report. Compare this sample distribution with the distribution of the 3 income groups for Edmonton in the census. Specifically, provide the exact differences in distribution of each of the 3 income groups in Edmonton between the census 2016 and the sample data in 2021. (b) Carry out an appropriate hypothesis test at a = 0.01 to see whether the distribution of household income in 2021 is different from the distribution of household income in 2016. State the null and alternative hypotheses in terms of parameters, population proportions for the three household group income. Report the value of the appropriate test statistic, the distribution of the test statistic under the null hypothesis, and the P-value of the test to answer the question. State your conclusion. (c) Regardless of your results in part (b), carry out an appropriate hypothesis test at a = 0.01 to see whether the proportion of households in Edmonton with income under $50,000 is now higher than 22.92% (which is the rounded percentage of households with income under $50,000 in the 2016 census). State the null and alternative hypotheses in terms of parameters. Report the value of the appropriate z-test statistic, the distribution of the test statistic under the null hypothesis, and the P- value of the test to answer the question. State your conclusion. (d) Find a 98% two-sided confidence interval for the proportion of households in Edmonton with income under $50,000. (Hint: Although a one-sided confidence interval can be obtained in R, this type of confidence interval is not discussed in STAT 151 classes. Therefore, students must use a two- 2 sided confidence interval to answer this question). Interpret the confidence interval. Use the confidence interval to answer the question in part (c). Compare the result from the confidence interval with the conclusion from part (c)