, Data Analysis Statistics An Introduction to Statistical Inference and Data Analysis 

[ Pobierz całość w formacie PDF ]

central 50% of the data. Thus, the length of the rectangle equals the sample
interquartile range. The location of the sample median is also identified, and
its location within the rectangle often provides insight into whether or not
136 CHAPTER 7. DATA
the population from which the sample was drawn is symmetric. Whiskers
extend from the ends of the rectangle, either to the extreme values of the
data or to 1.5 times the sample interquartile range, whichever is less. Values
that lie beyond the whiskers are called outliers and are individually identified
by additional lines.
Figure 7.2: Box Plot of a Sample from Ç2(3)
Example 5 The pdf of the asymmetric distribution Ç2(3) was graphed
in Figure 4.8. The following S-Plus commands draw a random sample of
n = 100 observed values from this population, then construct a box plot of
the sample:
> x
> boxplot(x)
An example of a box plot produced by these commands is displayed in Figure
7.2. In this box plot, the numerical values in the sample are represented by
the vertical axis.
The third quartile of the box plot in Figure 7.2 is farther above the
median than the first quartile is below it. The short lower whisker extends
2
4
6
8
10
7.3. PLUG-IN ESTIMATES OF QUANTILES 137
from the first quartile to the minimal value in the sample, whereas the long
upper whisker extends 1.5 interquartile ranges beyond the third quartile.
Furthermore, there are 4 outliers beyond the upper whisker. Once we learn
to discern these key features of the box plot, we can easily recognize that
the population from which the sample was drawn is not symmetric.
The frequency of outliers in a sample often provides useful diagnostic
information. Recall that, in Section 5.3, we computed that the interquartile
range of a normal distribution is 1.34898. A value is an outlier if it lies more
than
1.34898
z = + 1.5 · 1.34898 = 2.69796
2
standard deviations from the mean. Hence, the probability that an observa-
tion drawn from a normal distribution is an outlier is
> 2*pnorm(-2.69796)
[1] 0.006976582
and we would expect a sample drawn from a normal distribution to contain
approximately 7 outliers per 1000 observations. A sample that contains a
dramatically different proportion of outliers, as in Example 5, is not likely
to have been drawn from a normal distribution.
Box plots are especially useful for comparing several populations.
Example 6 We drew samples of 100 observations from three normal
populations: Normal(0, 1), Normal(2, 1), and Normal(1, 4). To attempt to
discern in the samples the various differences in population mean and stan-
dard deviation, we examined side-by-side box plots. This was accomplished
by the following S-Plus commands:
> z1
> z2
> z3
> boxplot(z1,z2,z3)
An example of the output of these commands is displayed in Figure 7.3.
7.3.2 Normal Probability Plots
Another powerful graphical technique that relies on quantiles are quantile-
quantile (QQ) plots, which plot the quantiles of one distribution against the
138 CHAPTER 7. DATA
Figure 7.3: Box Plots of Samples from Three Normal Distributions
quantiles of another. QQ plots are used to compare the shapes of two distri-
butions, most commonly by plotting the observed quantiles of an empirical
distribution against the corresponding quantiles of a theoretical normal dis-
tribution. In this case, a QQ plot is often called a normal probability plot. If
the shape of the empirical distribution resembles a normal distribution, then
the points in a normal probability plot should tend to fall on a straight line.
If they do not, then we should be skeptical that the sample was drawn from
a normal distribution. Extracting useful information from normal probabil-
ity plots requires some practice, but the patient data analyst will be richly
rewarded.
Example 4 (continued) A normal probability plot of the sample gen-
erated in Example 5 against a theoretical normal distribution is displayed in
Figure 7.4. This plot was created using the following S-Plus command:
> qqnorm(x)
Notice the systematic and asymmetric bending away from linearity in this [ Pobierz całość w formacie PDF ]
  • zanotowane.pl
  • doc.pisz.pl
  • pdf.pisz.pl
  • modemgsm.keep.pl