,
[ Pobierz całość w formacie PDF ]
central 50% of the data. Thus, the length of the rectangle equals the sample interquartile range. The location of the sample median is also identified, and its location within the rectangle often provides insight into whether or not 136 CHAPTER 7. DATA the population from which the sample was drawn is symmetric. Whiskers extend from the ends of the rectangle, either to the extreme values of the data or to 1.5 times the sample interquartile range, whichever is less. Values that lie beyond the whiskers are called outliers and are individually identified by additional lines. Figure 7.2: Box Plot of a Sample from Ç2(3) Example 5 The pdf of the asymmetric distribution Ç2(3) was graphed in Figure 4.8. The following S-Plus commands draw a random sample of n = 100 observed values from this population, then construct a box plot of the sample: > x > boxplot(x) An example of a box plot produced by these commands is displayed in Figure 7.2. In this box plot, the numerical values in the sample are represented by the vertical axis. The third quartile of the box plot in Figure 7.2 is farther above the median than the first quartile is below it. The short lower whisker extends 2 4 6 8 10 7.3. PLUG-IN ESTIMATES OF QUANTILES 137 from the first quartile to the minimal value in the sample, whereas the long upper whisker extends 1.5 interquartile ranges beyond the third quartile. Furthermore, there are 4 outliers beyond the upper whisker. Once we learn to discern these key features of the box plot, we can easily recognize that the population from which the sample was drawn is not symmetric. The frequency of outliers in a sample often provides useful diagnostic information. Recall that, in Section 5.3, we computed that the interquartile range of a normal distribution is 1.34898. A value is an outlier if it lies more than 1.34898 z = + 1.5 · 1.34898 = 2.69796 2 standard deviations from the mean. Hence, the probability that an observa- tion drawn from a normal distribution is an outlier is > 2*pnorm(-2.69796) [1] 0.006976582 and we would expect a sample drawn from a normal distribution to contain approximately 7 outliers per 1000 observations. A sample that contains a dramatically different proportion of outliers, as in Example 5, is not likely to have been drawn from a normal distribution. Box plots are especially useful for comparing several populations. Example 6 We drew samples of 100 observations from three normal populations: Normal(0, 1), Normal(2, 1), and Normal(1, 4). To attempt to discern in the samples the various differences in population mean and stan- dard deviation, we examined side-by-side box plots. This was accomplished by the following S-Plus commands: > z1 > z2 > z3 > boxplot(z1,z2,z3) An example of the output of these commands is displayed in Figure 7.3. 7.3.2 Normal Probability Plots Another powerful graphical technique that relies on quantiles are quantile- quantile (QQ) plots, which plot the quantiles of one distribution against the 138 CHAPTER 7. DATA Figure 7.3: Box Plots of Samples from Three Normal Distributions quantiles of another. QQ plots are used to compare the shapes of two distri- butions, most commonly by plotting the observed quantiles of an empirical distribution against the corresponding quantiles of a theoretical normal dis- tribution. In this case, a QQ plot is often called a normal probability plot. If the shape of the empirical distribution resembles a normal distribution, then the points in a normal probability plot should tend to fall on a straight line. If they do not, then we should be skeptical that the sample was drawn from a normal distribution. Extracting useful information from normal probabil- ity plots requires some practice, but the patient data analyst will be richly rewarded. Example 4 (continued) A normal probability plot of the sample gen- erated in Example 5 against a theoretical normal distribution is displayed in Figure 7.4. This plot was created using the following S-Plus command: > qqnorm(x) Notice the systematic and asymmetric bending away from linearity in this [ Pobierz caÅ‚ość w formacie PDF ] |
Podobne
|