Question: STAT 480 Project 1 Instructions: 1. Please read this project quickly now and then again more carefully later, so that you will understand what your

STAT 480 Project 1 Instructions: 1. Please read this project quickly now and then again more carefully later, so that you will understand what your group needs to manage this project. If you have any general questions, such as the description of the problems below, you can send an email to me or TA at least one day ahead of the due date. 2. Type your group report using Microsoft Word or latex, and use single spacing and a 12 point font in the main body of your report. 3. Every report should have the title (STAT 480 Project 1), authors and date on the rst page. 4. All pages should be numbered, and all Tables and Figures should be clearly labeled and numbered, too. 5. Make sure you proofread your report before submitting it. 6. For each group, you need to submit both your group report (in .pdf) and the corresponding R codes through Blackboard. If the R code is all you hand in, with no output result or explanation/discussion of your output result, you will not receive full credit for the question! 7. Please do not ask me or TA to debug your codes. 1. (15 pts) The Central Limit Theorem (CLT). CLT is considered to be one of the most important results in statistical theory. It states that means of an arbitrary nite distribution are always distributed according to a normal distribution, provided that the sample size, n, for calculating the mean is large enough. To see how big n needs to be we can use the following simulation idea, generate a sample of size n drawn repeatedly (say 1000 times) from 1 a U nif orm(0, 1) distribution. We want to verify that X = n n Xi is normally distributed, i=1 and we can draw a QQ-plot for the 1000 Xs to judge the normality. (Hint: runif() - generates values from a random uniform distribution between 0 and 1; use the for() loop to repeat the sampling) (a) (12 pts) Write an R function called CLTdemo(n) to illustrate the above CLT, put the number of observations, n, as calling argument to the function. When you draw your QQ-plot, make sure that title shows the sample size, n. For example, when n = 100, the QQ-plot title should be \"QQplot for sample size n=100\" Hint: to make the title, you can use title(paste("QQ-plot for sample size n=", as.character(n))) (b) (3 pts) Use the CLTdemo(n) function in part (a) for n = 2, 10, 25 and 100 and show your results. 2. (30 pts) A Random Walk. A symmetric simple random walk starting at the origin is dened as follows. Suppose X1 , X2 , . . . are independent and identically distributed random variables with the distribution +1, with probability 1/2; 1, with probability 1/2. Dene the sequence {Sn } 0 by S0 = 0 Sn = Sn1 + Xn for n = 1, 2, . . . Then {Sn } is a symmetric simple random walk starting at the origin. Note that the position of the walk at time n is just the sum of the previous n steps: Sn = X1 + + Xn . (a) (10 pts) Write a function rwalk(n) which takes a single argument n and returns a vector which is a realisation of (S0 , S1 , . . . , Sn ), the rst n positions of a symmetric random walk starting at the origin. Hint: the code sample(c(-1,1), n, replace=TRUE, prob=c(0.5,0.5)) simulates n steps. (b) (10 pts) Now write a function rwalkPos(n) which simulates one occurrence of the walk which lasts for a length of time n and then returns the length of time the walk spends above the x-axis. (Note that a walk with length 6 and vertices at 0, 1, 0, -1, 0, 1, 0 spends 4 units of time above the axis and 2 units of time below the axis.) (c) (10 pts) Now suppose we wish to investigate the distribution of the time the walk spends above the x-axis. This means we need a large number of replications of rwalkPos(n). Write two functions: rwalkPos1(nReps,n) which uses a loop and rwalkPos2(nReps,n) which uses replicate or sapply. Compare the execution times of these two functions by using system.time. 3. (15 pts) Simple Linear Regression. For the simple linear regression, we know that we can use the least squared method to estimate the intercept and slope. According to the statistical theory, our estimation will be more accurate when the sample size, n, is larger or the measurement error is smaller (i.e. the variance of the error is smaller). To see this, generate data (Xi , Yi ), i = 1, 2, ..., n from the following linear model: Yi = 0 + 1 Xi + i , i = 1, 2, ..., n where 0 = 1.5, 1 = 3, Xi 's are generated from N (1, 0.52 ) and the errors 's are generated from N (0, 2 ). For the generated dataset, do a linear regression of y on x and obtain the estimates of the coecients, 0 and 1 . Calculate the squared errors: (0 0 )2 + (1 1 )2 . Repeat the above for 200 times, then you will have 200 squared errors (0 0 )2 + (1 1 )2 . We would like to look at the average of these 200 squared errors (called MSE for mean squared error) to determine how accurate the estimators are, which is expected to decrease when we increase n or decrease 0 . (a) (12 pts) Create an R function MSE(n,sigma) with the arguments n and sigma to illustrate the eect of the number of observations n and the standard deviation in the above and report MSE. (b) (3 pts) Use the R function in (a) part for dierent combinations of n and : n = 50, 100, 200, = 0.1, 0.5. Report your MSEs for each combination. 4. (40 pts) Softball Team Data. The STAT/MATH/CS co-ed softball team has participated in intramural co-ed softball for 32 years, playing over 450 games during that time. Considering the composition of the team, it is perhaps not surprising that the team has compiled some statistics from those games. Their team captain team was wondering what relationship, if any, there was between how many hits and errors the team made in a game and how well they did. He consulted the data-set, and determined that the team had played 466 games over its history. For each game are listed RUNS (his team's score), ORUN (the opposing team's score), HIT (the number of hits obtained by his team), ERR (the number of errors committed by his team), and RES (the result; 'W', 'L', or 'T', depending on whether his team won, lost, or tied the game). The data set is attached in a le called 'softball.txt'. It contains a header line and 466 lines of data. (a) (5 pts) Create a new variable called DIFF, where DIFF=RUNS-ORUN. Then, execute appropriate commands to create and print out the 5-by-5 matrix of Pearson correlations between the ve numerical variables (DIFF, RUNS, ORUN, HIT, ERR) and discuss briey what this output reveals. (b) (10 pts) Since HIT is a measure of oensive prowess, one would suspect that the team would score more runs as HIT increases. Run a simple linear regression to predict RUNS from HIT. Use it to predict how many runs would be scored in a game when the team obtained 15 hits, and obtain a 95% Prediction Interval for this estimate. Of the 45 games in which the team achieved exactly 15 hits, in what proportion of these games was the team's RUNS actually in the interval calculated in the previous sentence? (c) (5 pts) Since ERR is a measure of defensive ineptitude, one would suspect that the opposing team would score more runs as ERR increases. Run a simple linear regression to predict ORUN from ERR to conrm this relationship. (d) (10 pts) Run a multiple regression to predict DIFF from HIT and ERR, tting the model: DIF F = 0 + 1 HIT + 2 ERR + . Softball fans would expect 1 to be signicantly positive and 2 to be signicantly negative. Test each of these alternatives at the = .01 level. (e) (10 pts) Suppose that one uses the regression equation of (d) to predict the outcome of a game by declaring the game a victory if the predicted DIFF > 0 and a loss if predicted DIFF < 0. Compare these results with the actual results and ll in the table below. (The 9 games which ended in ties should be counted as 1/2 wins and 1/2 losses). What % of all games are predicted correctly? (A game is predicted correctly if it falls in the upper left or lower right boxes in the table below.) RUNS ORUN HIT ERR RES 12 12 13 5 T 13 2 17 3 W 12 6 19 4 W 2 23 8 6 L 10 8 17 4 W 6 17 7 5 L 3 7 8 3 L 6 5 5 3 W 3 0 13 3 W 6 19 13 6 L 7 10 16 6 L 2 6 7 4 L 2 5 10 3 L 9 4 16 2 W 19 10 25 5 W 13 4 19 3 W 5 12 9 4 L 10 8 16 3 W 9 15 13 3 L 19 5 22 3 W 8 1 14 1 W 20 8 22 2 W 17 2 19 1 W 8 7 14 0 W 2 9 5 3 L 14 5 18 2 W 9 8 15 5 W 18 5 19 3 W 20 7 22 1 W 8 6 15 5 W 14 7 13 2 W 18 2 14 0 W 13 5 17 4 W 12 11 18 4 W 5 11 9 7 L 30 3 31 0 W 21 8 25 1 W 15 10 19 3 W 6 8 13 4 L 11 14 11 3 L 5 2 7 2 W 1 8 3 5 L 9 3 8 7 W 8 9 11 5 L 11 10 10 3 W 2 8 6 4 L 4 12 11 4 L 12 4 13 3 W 10 5 17 2 W 12 15 14 4 L 14 3 14 0 W 7 2 12 4 W 11 5 8 0 W 14 6 17 1 W 13 7 15 6 W 3 4 11 6 L 2 4 6 6 L 4 7 12 3 L 6 13 12 6 L 12 10 23 9 W 11 6 14 3 W 10 9 15 4 W 13 4 19 1 W 2 9 11 6 L 3 6 7 5 L 4 3 10 3 W 4 4 13 3 T 9 2 15 1 W 6 12 15 7 L 6 6 15 8 T 15 8 20 3 W 3 6 6 6 L 16 4 19 1 W 12 10 18 3 W 15 14 23 7 W 5 13 11 4 L 13 1 16 1 W 4 11 9 7 L 5 8 12 6 L 4 6 12 5 L 5 1 9 4 W 2 6 12 6 L 9 8 12 3 W 16 4 21 3 W 15 5 17 3 W 8 11 14 1 L 12 14 15 9 L 7 14 15 7 L 14 2 20 2 W 11 16 18 10 L 10 1 18 0 W 15 3 16 1 W 2 6 9 5 L 14 7 16 4 W 16 2 20 1 W 12 7 16 3 W 10 9 11 3 W 1 17 5 2 L 7 4 10 2 W 6 1 12 3 W 4 9 12 5 L 1 13 5 6 L 7 3 11 1 W 13 7 15 3 W 2 10 11 5 L 7 5 12 4 W 10 2 11 5 W 4 12 9 6 L 8 9 12 4 L 7 8 12 6 L 7 12 15 6 L 10 19 8 9 L 5 4 11 4 W 5 13 7 4 L 13 9 16 5 W 6 1 14 2 W 9 7 15 4 W 2 14 8 5 L 17 2 14 3 W 18 2 15 2 W 13 13 17 3 T 13 12 18 8 W 25 5 22 1 W 6 4 9 3 W 20 0 19 2 W 24 2 25 4 W 4 0 9 2 W 7 17 11 6 L 10 10 15 2 T 15 0 17 2 W 10 13 13 5 L 8 4 16 3 W 12 6 16 3 W 12 1 10 3 W 19 12 23 6 W 8 1 13 1 W 7 9 14 3 L 9 4 10 4 W 7 6 13 3 W 18 7 19 4 W 10 7 12 4 W 12 10 17 4 W 11 3 14 0 W 13 5 16 3 W 10 4 17 4 W 6 12 6 2 L 12 2 11 2 W 6 15 12 3 L 14 10 17 2 W 3 14 5 4 L 12 0 16 0 W 17 6 17 0 W 10 2 16 2 W 22 0 28 2 W 11 3 13 0 W 5 10 16 5 L 10 6 15 4 W 8 6 12 5 W 5 4 11 4 W 5 4 13 5 W 18 11 20 5 W 14 13 17 8 W 12 3 12 2 W 3 7 6 1 L 8 5 18 2 W 10 9 17 4 W 14 13 18 3 W 13 14 16 5 L 14 0 17 0 W 11 10 13 8 W 16 0 13 1 W 16 3 18 2 W 13 3 18 2 W 8 6 10 2 W 16 5 21 6 W 12 4 16 4 W 19 3 22 2 W 16 5 21 3 W 10 5 14 2 W 14 6 21 4 W 7 6 15 2 W 9 5 13 1 W 6 8 9 2 L 11 10 14 4 W 8 9 13 6 L 8 3 16 4 W 19 0 19 0 W 14 6 17 3 W 1 16 6 4 L 10 11 13 6 L 17 9 24 4 W 7 14 12 5 L 19 1 25 1 W 13 8 19 4 W 14 8 20 0 W 11 10 19 5 W 13 4 16 3 W 9 10 14 5 L 9 0 13 2 W 29 7 37 5 W 10 0 13 1 W 14 7 22 0 W 13 3 17 2 W 18 6 25 2 W 21 12 25 1 W 30 1 28 1 W 14 12 22 3 W 6 1 13 0 W 6 5 11 4 W 17 6 19 5 W 16 1 13 2 W 14 6 24 2 W 12 4 16 4 W 8 7 15 2 W 6 4 12 1 W 12 1 15 2 W 13 3 14 2 W 13 20 17 6 L 8 7 10 2 W 7 9 13 5 L 10 0 13 1 W 9 7 11 4 W 15 1 22 2 W 13 11 20 6 W 15 5 13 3 W 15 4 22 0 W 7 9 15 3 L 20 11 21 6 W 20 9 20 2 W 19 15 17 4 W 2 8 10 1 L 9 12 15 4 L 12 13 9 4 L 2 15 8 3 L 11 5 13 2 W 15 7 23 1 W 8 13 13 6 L 15 5 19 4 W 6 11 11 3 L 10 6 15 3 W 15 6 15 2 W 9 13 10 5 L 9 4 13 2 W 15 12 18 2 W 7 8 16 6 L 22 12 20 6 W 18 17 21 6 W 1 15 7 5 L 7 8 12 4 L 10 13 10 6 L 6 15 16 8 L 14 11 20 5 W 6 3 12 3 W 13 8 18 3 W 10 9 10 2 W 14 15 13 2 L 4 14 9 4 L 21 4 16 1 W 6 12 15 3 L 5 4 11 2 W 12 13 16 4 L 11 13 14 2 L 16 1 10 0 W 16 4 17 2 W 10 2 11 1 W 9 4 12 3 W 7 6 9 2 W 9 6 13 1 W 10 11 10 4 L 12 11 17 4 W 11 0 10 3 W 17 7 25 2 W 4 8 11 1 L 8 8 14 3 T 8 13 10 4 L 17 2 16 2 W 11 1 15 2 W 13 1 18 2 W 16 4 19 1 W 10 7 12 4 W 10 3 11 4 W 16 0 15 2 W 14 9 15 3 W 26 9 27 2 W 8 9 18 1 L 10 3 15 1 W 13 3 15 1 W 17 22 20 11 L 11 5 17 5 W 12 11 13 8 W 9 10 15 2 L 13 7 14 3 W 13 8 12 2 W 10 6 14 5 W 16 5 20 4 W 13 2 15 3 W 10 13 19 5 L 12 15 19 9 L 17 5 18 3 W 8 14 14 5 L 20 4 17 0 W 4 7 8 3 L 24 7 29 2 W 8 9 12 1 L 6 3 14 6 W 4 9 10 2 L 9 5 12 5 W 7 9 12 4 L 17 7 19 2 W 11 1 9 3 W 16 6 19 3 W 12 10 18 3 W 8 9 10 2 L 10 13 17 5 L 2 16 8 5 L 7 4 11 3 W 14 4 15 1 W 0 11 3 6 L 15 0 14 0 W 20 4 21 3 W 13 2 16 2 W 6 3 8 3 W 23 0 18 0 W 21 4 22 4 W 11 5 16 5 W 13 2 18 1 W 17 2 20 1 W 27 1 31 1 W 11 2 17 2 W 18 2 18 1 W 18 0 18 0 W 15 9 22 1 W 16 1 14 0 W 18 8 18 3 W 15 5 19 5 W 3 2 6 3 W 11 1 12 3 W 7 7 15 5 T 9 2 13 3 W 13 5 16 4 W 13 10 16 4 W 5 3 9 1 W 17 2 16 3 W 1 15 4 8 L 19 18 22 7 W 16 7 21 3 W 12 15 22 4 L 12 5 16 6 W 19 4 19 0 W 18 4 17 3 W 17 9 16 3 W 15 2 18 1 W 20 9 23 2 W 18 8 22 1 W 16 10 18 1 W 10 3 18 3 W 17 18 14 5 L 14 15 18 4 L 10 0 14 2 W 7 8 13 4 L 17 12 20 2 W 15 9 17 4 W 7 14 10 3 L 18 8 22 3 W 11 0 14 1 W 15 4 13 2 W 16 0 15 0 W 16 4 19 2 W 32 2 27 1 W 12 11 24 5 W 17 1 21 4 W 15 1 17 0 W 12 3 12 1 W 11 1 18 2 W 16 1 16 1 W 14 3 20 1 W 22 7 23 4 W 14 8 17 6 W 11 6 15 2 W 25 4 28 0 W 28 6 29 1 W 14 14 18 7 T 5 4 10 2 W 14 9 19 0 W 15 8 13 1 W 14 9 15 5 W 7 9 11 3 L 8 11 10 7 L 3 5 7 3 L 9 6 14 3 W 10 7 12 5 W 14 4 17 1 W 2 15 2 1 L 5 1 8 1 W 10 2 18 1 W 24 6 26 3 W 14 4 18 5 W 13 14 16 2 L 9 13 15 7 L 9 8 16 3 W 9 6 15 2 W 15 21 22 7 L 6 9 17 3 L 6 14 11 7 L 10 9 16 3 W 8 11 15 3 L 3 13 10 3 L 15 9 12 3 W 9 5 10 2 W 10 9 14 3 W 16 15 19 6 W 14 13 17 4 W 8 7 16 3 W 2 12 6 3 L 12 10 9 5 W 14 8 16 1 W 18 2 20 1 W 13 8 11 1 W 14 16 24 4 L 25 3 24 3 W 3 13 7 6 L 8 9 13 4 L 9 10 15 3 L 12 20 17 6 L 9 11 16 3 L 22 11 23 4 W 9 19 15 2 L 7 19 8 5 L 8 5 7 2 W 13 12 15 7 W 3 22 7 7 L 10 15 10 2 L 16 6 15 2 W 14 13 14 5 W 9 8 10 2 W 3 12 10 1 L 15 10 16 3 W 8 2 13 1 W 15 3 20 1 W 19 0 21 0 W 19 7 23 1 W 21 6 24 0 W 21 10 30 4 W 5 3 11 1 W 21 12 22 3 W 11 14 15 6 L 10 12 12 3 L 18 5 17 4 W 18 11 13 4 W 19 5 18 3 W 7 10 14 3 L 13 9 16 2 W 13 10 15 2 W 5 12 12 4 L 4 9 7 2 L 19 9 16 3 W 9 10 11 8 L 14 9 18 2 W 17 2 22 2 W 14 14 14 5 T 7 9 11 4 L 12 0 16 1 W 8 7 16 2 W 7 3 15 1 W 7 6 12 3 W 7 6 14 3 W

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!