Question: In this exercise we see how the default settings in for producing boxplots in Minitab and in (mathrm{R}) can be misleading because they do not

In this exercise we see how the default settings in for producing boxplots in Minitab and in $\mathrm{R}$ can be misleading because they do not take the sample size into account. We will generate three samples of di erent sizes from the same distribution, and compare their boxplots.

[Minitab:] Generate 250 normal (0 1) observations and put them in col-

(a) What do you notice from the resulting boxplot?

(b) Which sample seems to have a heavier tail?

(c) Why is this misleading?

(d) [Minitab:] Click on the boxplot. Then pull down the Editor menu down to Select Item and over to Outlier Symbols. Click on Custom in the dialog box, and select Dot.
[Minitab version 17.2:] Left click any one of the outlying points in the boxplot. Then right click to bring up the context menu and select Edit Outlier Symbols. Change the symbols to Custom and use the dropdown box to select the Dot symbol.
[R:] In $\mathrm{R}$ it is easy to make the box width proportional to the (square root) of the sample size by using the varwidth parameter. Simply type:
\[
\text { boxplot }\left(y^{\sim} \mathrm{x}, \text { varwidth }=\text { TRUE }\right)
\]

(e) Is the graph still as misleading as the original?

[Minitab:] Generate 250 normal (0 1) observations and put them in col- umn c1 by pulling down the Calc menu to the Random Data command over to Normal and lling in the dialog box. Generate 1,000 normal (0 1) observations the same way and put them in column c2, and generate 4,000 normal(0 1) observations the same way and put them in column c3. Stack these three columns by pulling down the Data menu down to Stack and over to Columns and lling in the dialog box to put the stacked column into c4, with subscripts into c5. Form stacked boxplots by pulling down Graph menu to Boxplot command and lling in dialog box. The Graph variable is c4 and Categorical variable is c5. [R:] # We could just use y = rnorm(5250) # but this the three group sizes clear y = rnorm(sum (c (250, 1000, 4000))) x = rep(1:3, c(250, 1000, 4000))) boxplot (y^x)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Business Statistics Questions!

On December 31, 2020, GameStop closed at $18.84. By the end of January 2021, the stock closed at an astronomically high $325, for an astounding one-month return of 1625%! A week later, the price was...

Read the case study: COMMONWEALTH OF THE BAHAMAS IN THE SUPREME COURT Common Law and Equity Division 2019/CLE/gen/01037 BETWEEN NIQUEL PINDER 1 st Plaintiff SHARMAINE BARR 2 nd Plaintiff TIFFANY REID...

CANMNMM January of this year. (a) Each item will be held in a record. Describe all the data structures that must refer to these records to implement the required functionality. Describe all the...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

In this exercise we see how the default setting in the Minitab boxplot command can be misleading, since it doesnt take the sample size into account. We will generate three samples of different sizes...

3.6 In this exercise we see how the default setting in the Minitab boxplot command can be misleading, since it doesnt take the sample size into account. We will generate three samples of different...

Instuctor's Annotated Edition TENTH EDITION Understandable Statistics Concepts and Methods Charles Henry Brase Regis University Corrinne Pellillo Brase Arapahoe Community College Australia Brazil...

nodes, but at least its bias can be quantified by Markov Chain L. INTRODUCTION analysis and thus can be corrected via appropriate re-weighting The popularity of online social networks (OSNs) in...

Set Student Name: 1. Describe the relationship between two variables that have a correlation coefficient value: a. Near -1 b. Near 0 c. Near 1 2. Data was collected where a weightlifter was asked to...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

GENETICS AND FORCES OF EVOLUTION Introduction The goal of anthropological genetics is to understand the evolutionary relationships, demographic histories, and genetic bases of biological variation in...

Can we infer that Republican Party supporters earn more income (INCOME) than do Democratic Party supporters?

1. In the context of a price competition game between two firms (called firm 1 and firm 2), a best response function for firm 1 is.. A) The equilibrium price that firm 1 should set B) A rule that...

19. What are the general rules for measuring a gain or a loss by a debtor in a debt extinguishment?

15. A fixed-income portfolio manager is unwilling to realize a rate of return of less than 3% annually over a 5-year investment period on a portfolio currently valued at $1 million. Three years...

As part of the marketing group at Pixar, you are asked to find out the age distribution of the audience of Pixars latest film. With the help of 10 of your colleagues, you conduct exit interviews by...

In their 2013 annual report, Mattel Inc. re-ported that their domestic market sales were broken down as follows: 49.6% Mattel Girls and Boys brand, 36.1% Fisher-Price brand, and the rest of their...

1 In each spreadsheet, enter data only in the cells highlighted in grey. 2 Relevant data is provided in the handout. 3 Upload this completed workbook to the designated assignment folder in eConestoga...

Diston Company uses the weighted-average method in its process costing system. The first processing department, the Welding Department, started the month with 19,000 units in its beginning work in...

Current Attempt in Progress The following is available for Sunland Repair Shop for 2019 Repair technician $370000 Fringe benefits Overhead Total 1000 66000 $520000 The desired prohtmargins $48 per...