Question: In this exercise we see how the default settings in for producing boxplots in Minitab and in (mathrm{R}) can be misleading because they do not
In this exercise we see how the default settings in for producing boxplots in Minitab and in \(\mathrm{R}\) can be misleading because they do not take the sample size into account. We will generate three samples of di erent sizes from the same distribution, and compare their boxplots.
![[Minitab:] Generate 250 normal (0 1) observations and put them in col-](https://dsd5zvtm8ll6.cloudfront.net/images/question_images/1711/3/5/5/92366013813278881711355921299.jpg)
(a) What do you notice from the resulting boxplot?
(b) Which sample seems to have a heavier tail?
(c) Why is this misleading?
(d) [Minitab:] Click on the boxplot. Then pull down the Editor menu down to Select Item and over to Outlier Symbols. Click on Custom in the dialog box, and select Dot.
[Minitab version 17.2:] Left click any one of the outlying points in the boxplot. Then right click to bring up the context menu and select Edit Outlier Symbols. Change the symbols to Custom and use the dropdown box to select the Dot symbol.
[R:] In \(\mathrm{R}\) it is easy to make the box width proportional to the (square root) of the sample size by using the varwidth parameter. Simply type:
\[
\text { boxplot }\left(y^{\sim} \mathrm{x}, \text { varwidth }=\text { TRUE }\right)
\]
(e) Is the graph still as misleading as the original?
[Minitab:] Generate 250 normal (0 1) observations and put them in col- umn c1 by pulling down the Calc menu to the Random Data command over to Normal and lling in the dialog box. Generate 1,000 normal (0 1) observations the same way and put them in column c2, and generate 4,000 normal(0 1) observations the same way and put them in column c3. Stack these three columns by pulling down the Data menu down to Stack and over to Columns and lling in the dialog box to put the stacked column into c4, with subscripts into c5. Form stacked boxplots by pulling down Graph menu to Boxplot command and lling in dialog box. The Graph variable is c4 and Categorical variable is c5. [R:] # We could just use y = rnorm(5250) # but this the three group sizes clear y = rnorm(sum (c (250, 1000, 4000))) x = rep(1:3, c(250, 1000, 4000))) boxplot (y^x)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
