Question: R STUDIO #Part II. One-sample z-tests ############ # In this demo, we will practice using the one sample z-test on the following hypothesis: # Irregular

***R STUDIO*** #Part II. One-sample z-tests ############ # In this demo, we will practice using the one sample z-test on the following hypothesis: # Irregular verbs in English are on average much more frequent compared to all verbs in general. # # We will get a list of verbs with their frequencies from the CELEX database, and assume # that this list is exhaustive and represents the population of English verbs. # # We will then take various samples from this list, including # random samples and a specific sample that includes a subset of irregular verbs. # # We will perform a one-sample z-test to find out whether our samples differ significantly from the general population. # To extract information from the Celex database (http://celex.mpi.nl) by yourself, you have to learn how to use the web-interface for Celex. # If you're interested, watch this tutorial to learn how to do this: https://www.youtube.com/watch?v=BkEk3h3R1cU. # For this assignment, I have extracted the list of English verbs with their lemma frequencies, and stored it in the file English.verb.lemmas.csv # which you can find on Sakain in the Resources/Data folder. Upload this file (after you save it on your hard drive) into R like this: #============================================= #For the below command to work your working directory must be set to the directory where the datafile is stored. #The second column of the file contains verb frequencies in instances per million. celexLemmas <- read.csv("EnglishVerbLemmas.csv", stringsAsFactors = FALSE) #Plot the distribution of frequencies. Is it normal? (What plot should you use?) ... #what is the mean lemma frequency of all English verbs (population mean) pmean <- ... #what is the population variance for all verbs. Remember, you cannot use the var function here, since it's only for samples. # Use the pop_var function you wrote in the first few weeks of this course ... pvar = pop_var(celexLemmas$CobMln) p_std = sqrt(pvar). #standard deviation #Let's now look at a subset of all verbs, namely some irregular verbs. #Research hypothesis: a sample of 25 irregular verbs have higher frequencies #compared to all verbs in general. Note that we chose a large enough sample so that even if the population is not normally distributed, #we can still use the z-test according to the CLT. #State the null and the alternative hypotheses. #Hypothesis 0: #Alternative 1: #We will be comparing a sample to the population with known mean and variance. #Here's a list of 25 irregular verbs: irregVerbs = c("bring","buy","teach","sleep","drive","wring","sting","drink","sing","wake","arise","dream", "spin","put","swing","string","be","beat","become","begin","bear","bend","bind","bleed","build") # Select the subset of the celexLemmas dataframe that corresponds to these verbs. # Hint: use the command below. Note that the keyword %in% tests for membership in a vector. which(celexLemmas$Head %in% irregVerbs) ind <- which(celexLemmas$Head %in% irregVerbs) #subset the dataframe to contain just the irregular verbs iv <- ... #Carry out the appropriate steps for the statistical test. #Step 1. What is the sample mean? s_mean <- ... #Is it lower or higher than population mean? Is it what you expected? #Step 2. What is the Standard Error for a sample of this size? # Recall that Standard Error is the standard deviation of the sampling distribution of means from the population. se <- ... #Step 3. What is the z=score? z = ... # Step 4. What is the probability of observing a sample mean this extreme if our sample is representative of the population? pnorm(s_mean, mean=pmean, sd=se, lower.tail = FALSE) #According to this test, should we reject or accept the null-hypothesis? ########### #Now lets do the same thing, but take a truly random sample of the same size. ########### r_sample_i <- sample(1:length(celexLemmas$Head), 25) random_verbs <- ... #Repeat all the steps you did before for a sample of irregular verbs to test the new hypothesis that the random sample is not representative of the general population. ... #Report your findings: z-score, the p-values associated with it and your conclusion. ######################### # Write a general function "z-test," that performs a one-sample z-test comparing a sample to a population. # Your function should take the following arguments, population mean and variance, a vector representing the sample data, # a variable "twoTailed" that indicates whether the test will be one- or two-tailed, and "alpha" indicating # what level of significance you assume. The function should output a z-score and a p-value corresponding to the results of the test, # and print a statement saying whether the null hypothesis was rejected or failed to be rejected.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Discuss Semantics and the challenges they are in English. 2 Language Structure and Use Learning Outcomes After reading this chapter, you should be able to ... Explain how language contributes to...

Read Classroom Glimpse. Discuss stress, rhythm, pitch, and intonation based on the tale in the classroom 2 Language Structure and Use Learning Outcomes After reading this chapter, you should be able...

Location Income ($1,000) Urban 27 Rural 25 Suburban 25 Suburban 26 Rural 30 Urban 29 Rural 33 Urban 30 Suburban 32 Urban 34 Urban 35 Urban 40 Rural 30 Rural 33 Urban 42 Suburban 32 Urban 43 Urban 43...

Please discuss in five hundred words explaining how the articles connect the resources to concepts in Chapter 8. I have upload chapter 8 of the text book below and the articles to read. (Samovar, L....

Part A: Qualitative Research (30 points) Section 1: Reading, Memo Writing and Categorizing (20 points) This portion of the assignment is designed to help you develop/employ key qualitative research...

A discrete sequence {xn} can be converted into a continuous representation x(t) = ts X n= (t n ts) xn, where ts is the sampling period. (a) State two characteristic properties of Dirac's function. [2...

Case Study: MANAGING DIVERSITY IN THE HOTEL INDUSTRY : THE CASE OF YOGYAKARTA, INDONESIA Dr. James J. Spillane, S.J. I. INTRODUCTION One of the major developments in the global economy during the...

ANOVA & Correlation SPSS Instructions How to Run a One-Way ANOVA in SPSS Your data should have at least 2 variables (your IV and DV). - Your IV should have at least 3 groups - Your DV should be...

PSY 07202 Statistics in Psychology Assignment 15: SPSS Assignment 3 This assignment is designed to give you practice choosing the appropriate test statistic and calculating that test statistic using...

Please read the question Question : What strategies have you used to communicate in a language you were acquiring? What strategies do you think emergent bilinguals use? 3 How Do People Learn and How...

The bond energy for a C-H bond is about 413 kJ/mol in CH, but 380 kJ/mol in CHB13. Although these values are relatively close in magnitude, they are different. Explain why they are different. Does...

Find f '(x) and f "(x). f(x) = xex f'(x) B F"(x) = Find f '(x) and f "(x). x 7 + 2x f(x) = f'(x) = f"(x) =

Activity: Identifying Basic Authentication and Authorization concepts. You are the security administrator for Develetech Industries, a manufacturer of home electronics. You want to discuss various...

4. (20 pts) Develop a Matlab script file that prompts the user for the vector of coefficients for the yin - il terms in a difference equation written in descending order (i.e., for i = 0,1,2,...),...

4. Refer to Table 36.2 and assume that the Feds reserve ratio is 10 percent and the economy is in a severe recession. Also suppose that the commercial banks are hoarding all excess reserves (not...

LO37.7 Convey why investment decisions are determined primarily by investment returns and nondiversifiable risk and how investment returns compensate for being patient and for bearing...

3. How do stocks and bonds differ in terms of the future payments that they are expected to make? Which type of investment (stocks or bonds) is considered to be more risky? Given what you know, which...

Question: ***R STUDIO*** #Part II. One-sample z-tests ############ # In this demo, we will practice using the one sample z-test on the following hypothesis: # Irregular

Step by Step Solution

Students Have Also Explored These Related Databases Questions!

Question: R STUDIO #Part II. One-sample z-tests ############ # In this demo, we will practice using the one sample z-test on the following hypothesis: # Irregular