Question: ***R STUDIO*** #Part II. One-sample z-tests ############ # In this demo, we will practice using the one sample z-test on the following hypothesis: # Irregular
***R STUDIO*** #Part II. One-sample z-tests ############ # In this demo, we will practice using the one sample z-test on the following hypothesis: # Irregular verbs in English are on average much more frequent compared to all verbs in general. # # We will get a list of verbs with their frequencies from the CELEX database, and assume # that this list is exhaustive and represents the population of English verbs. # # We will then take various samples from this list, including # random samples and a specific sample that includes a subset of irregular verbs. # # We will perform a one-sample z-test to find out whether our samples differ significantly from the general population. # To extract information from the Celex database (http://celex.mpi.nl) by yourself, you have to learn how to use the web-interface for Celex. # If you're interested, watch this tutorial to learn how to do this: https://www.youtube.com/watch?v=BkEk3h3R1cU. # For this assignment, I have extracted the list of English verbs with their lemma frequencies, and stored it in the file English.verb.lemmas.csv # which you can find on Sakain in the Resources/Data folder. Upload this file (after you save it on your hard drive) into R like this: #============================================= #For the below command to work your working directory must be set to the directory where the datafile is stored. #The second column of the file contains verb frequencies in instances per million. celexLemmas <- read.csv("EnglishVerbLemmas.csv", stringsAsFactors = FALSE) #Plot the distribution of frequencies. Is it normal? (What plot should you use?) ... #what is the mean lemma frequency of all English verbs (population mean) pmean <- ... #what is the population variance for all verbs. Remember, you cannot use the var function here, since it's only for samples. # Use the pop_var function you wrote in the first few weeks of this course ... pvar = pop_var(celexLemmas$CobMln) p_std = sqrt(pvar). #standard deviation #Let's now look at a subset of all verbs, namely some irregular verbs. #Research hypothesis: a sample of 25 irregular verbs have higher frequencies #compared to all verbs in general. Note that we chose a large enough sample so that even if the population is not normally distributed, #we can still use the z-test according to the CLT. #State the null and the alternative hypotheses. #Hypothesis 0: #Alternative 1: #We will be comparing a sample to the population with known mean and variance. #Here's a list of 25 irregular verbs: irregVerbs = c("bring","buy","teach","sleep","drive","wring","sting","drink","sing","wake","arise","dream", "spin","put","swing","string","be","beat","become","begin","bear","bend","bind","bleed","build") # Select the subset of the celexLemmas dataframe that corresponds to these verbs. # Hint: use the command below. Note that the keyword %in% tests for membership in a vector. which(celexLemmas$Head %in% irregVerbs) ind <- which(celexLemmas$Head %in% irregVerbs) #subset the dataframe to contain just the irregular verbs iv <- ... #Carry out the appropriate steps for the statistical test. #Step 1. What is the sample mean? s_mean <- ... #Is it lower or higher than population mean? Is it what you expected? #Step 2. What is the Standard Error for a sample of this size? # Recall that Standard Error is the standard deviation of the sampling distribution of means from the population. se <- ... #Step 3. What is the z=score? z = ... # Step 4. What is the probability of observing a sample mean this extreme if our sample is representative of the population? pnorm(s_mean, mean=pmean, sd=se, lower.tail = FALSE) #According to this test, should we reject or accept the null-hypothesis? ########### #Now lets do the same thing, but take a truly random sample of the same size. ########### r_sample_i <- sample(1:length(celexLemmas$Head), 25) random_verbs <- ... #Repeat all the steps you did before for a sample of irregular verbs to test the new hypothesis that the random sample is not representative of the general population. ... #Report your findings: z-score, the p-values associated with it and your conclusion. ######################### # Write a general function "z-test," that performs a one-sample z-test comparing a sample to a population. # Your function should take the following arguments, population mean and variance, a vector representing the sample data, # a variable "twoTailed" that indicates whether the test will be one- or two-tailed, and "alpha" indicating # what level of significance you assume. The function should output a z-score and a p-value corresponding to the results of the test, # and print a statement saying whether the null hypothesis was rejected or failed to be rejected. Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
