Creating Box Charts from Raw Data

[See SHWBOXA in the SAS/QC Sample Library]A petroleum company uses a turbine to heat water into steam that is pumped into the ground to make oil less viscous and easier to extract. This process occurs 20 times daily, and the amount of power (in kilowatts) used to heat the water to the desired temperature is recorded. The following statements create a SAS data set that contains the power output measurements for 20 days:

data Turbine;
   informat Day date7.;
   format Day date5.;
   label KWatts='Average Power Output';
   input Day @;
   do i=1 to 10;
      input KWatts @;
      output;
      end;
   drop i;
   datalines;
04JUL94 3196 3507 4050 3215 3583 3617 3789 3180 3505 3454
04JUL94 3417 3199 3613 3384 3475 3316 3556 3607 3364 3721
05JUL94 3390 3562 3413 3193 3635 3179 3348 3199 3413 3562
05JUL94 3428 3320 3745 3426 3849 3256 3841 3575 3752 3347
06JUL94 3478 3465 3445 3383 3684 3304 3398 3578 3348 3369
06JUL94 3670 3614 3307 3595 3448 3304 3385 3499 3781 3711
07JUL94 3448 3045 3446 3620 3466 3533 3590 3070 3499 3457
07JUL94 3411 3350 3417 3629 3400 3381 3309 3608 3438 3567
08JUL94 3568 2968 3514 3465 3175 3358 3460 3851 3845 2983
08JUL94 3410 3274 3590 3527 3509 3284 3457 3729 3916 3633
09JUL94 3153 3408 3741 3203 3047 3580 3571 3579 3602 3335
09JUL94 3494 3662 3586 3628 3881 3443 3456 3593 3827 3573
10JUL94 3594 3711 3369 3341 3611 3496 3554 3400 3295 3002
10JUL94 3495 3368 3726 3738 3250 3632 3415 3591 3787 3478
11JUL94 3482 3546 3196 3379 3559 3235 3549 3445 3413 3859
11JUL94 3330 3465 3994 3362 3309 3781 3211 3550 3637 3626
12JUL94 3152 3269 3431 3438 3575 3476 3115 3146 3731 3171
12JUL94 3206 3140 3562 3592 3722 3421 3471 3621 3361 3370
13JUL94 3421 3381 4040 3467 3475 3285 3619 3325 3317 3472
13JUL94 3296 3501 3366 3492 3367 3619 3550 3263 3355 3510
14JUL94 3795 3872 3559 3432 3322 3587 3336 3732 3451 3215
14JUL94 3594 3410 3335 3216 3336 3638 3419 3515 3399 3709
15JUL94 3850 3431 3460 3623 3516 3810 3671 3602 3480 3388
15JUL94 3365 3845 3520 3708 3202 3365 3731 3840 3182 3677
16JUL94 3711 3648 3212 3664 3281 3371 3416 3636 3701 3385
16JUL94 3769 3586 3540 3703 3320 3323 3480 3750 3490 3395
17JUL94 3596 3436 3757 3288 3417 3331 3475 3600 3690 3534
17JUL94 3306 3077 3357 3528 3530 3327 3113 3812 3711 3599
18JUL94 3428 3760 3641 3393 3182 3381 3425 3467 3451 3189
18JUL94 3588 3484 3759 3292 3063 3442 3712 3061 3815 3339
19JUL94 3746 3426 3320 3819 3584 3877 3779 3506 3787 3676
19JUL94 3727 3366 3288 3684 3500 3501 3427 3508 3392 3814
20JUL94 3676 3475 3595 3122 3429 3474 3125 3307 3467 3832
20JUL94 3383 3114 3431 3693 3363 3486 3928 3753 3552 3524
21JUL94 3349 3422 3674 3501 3639 3682 3354 3595 3407 3400
21JUL94 3401 3359 3167 3524 3561 3801 3496 3476 3480 3570
22JUL94 3618 3324 3475 3621 3376 3540 3585 3320 3256 3443
22JUL94 3415 3445 3561 3494 3140 3090 3561 3800 3056 3536
23JUL94 3421 3787 3454 3699 3307 3917 3292 3310 3283 3536
23JUL94 3756 3145 3571 3331 3725 3605 3547 3421 3257 3574
;

A partial listing of Turbine is shown in Figure 15.3. This data set is said to be in "strung-out" form since each observation contains the day and power output for a single heating. The first 20 observations contain the outputs for the first day, the second 20 observations contain the outputs for the second day, and so on. Because the variable Day classifies the observations into rational subgroups, it is referred to as the subgroup-variable. The variable KWatts contains the output measurements and is referred to as the process variable (or process for short).

Figure 15.3 Partial Listing of the Data Set Turbine
Kilowatt Power Output Data

Obs Day KWatts
1 04JUL 3196
2 04JUL 3507
3 04JUL 4050
4 04JUL 3215
5 04JUL 3583
6 04JUL 3617
7 04JUL 3789
8 04JUL 3180
9 04JUL 3505
10 04JUL 3454
11 04JUL 3417
12 04JUL 3199
13 04JUL 3613
14 04JUL 3384
15 04JUL 3475
16 04JUL 3316
17 04JUL 3556
18 04JUL 3607
19 04JUL 3364
20 04JUL 3721
21 05JUL 3390
22 05JUL 3562
23 05JUL 3413
24 05JUL 3193
25 05JUL 3635

You can use a box chart to examine the distribution of power output for each day and to determine whether the mean level of the heating process is in control. The following statements create the box chart shown in Figure 15.4:

ods graphics off;
title 'Box Chart for Power Output';
symbol v=dot;
proc shewhart data=Turbine;
   boxchart KWatts*Day;
run;

This example illustrates the basic form of the BOXCHART statement. After the keyword BOXCHART, you specify the process to analyze (in this case, KWatts), followed by an asterisk and the subgroup-variable (Day).

Figure 15.4 Box Chart for Power Output Data (Traditional Graphics)
Box Chart for Power Output Data (Traditional Graphics)

The input data set is specified with the DATA= option in the PROC SHEWHART statement.

By default, the BOXCHART statement requests an chart superimposed with box-and-whisker plots for each subgroup. Table 15.3 lists the summary statistics represented by each plot. For details on the computation of percentiles, see Percentile Definitions.

Table 15.3 Summary Statistics Represented by Box-and-Whisker Plots

Subgroup Summary Statistic

Feature of Box-and-Whisker Plot

Maximum

Endpoint of upper whisker

Third quartile (th percentile)

Upper edge of box

Median (th percentile)

Line inside box

Mean

Symbol marker (in this example, a dot)

First quartile (th percentile)

Lower edge of box

Minimum

Endpoint of lower whisker

The within-subgroup variation in power output is stable, as indicated in Figure 15.4 by the edges of the boxes and the endpoints of the whiskers. Since the subgroup means, indicated by the dots, lie within the control limits, you can conclude that the heating process is in statistical control.

The skeletal style of the box-and-whisker plots shown in Figure 15.4 is the default. You can request different styles, as illustrated in Example 15.2. By default, the control limits shown are limits estimated from the data; the formulas for the limits are given in Table 15.5 and Table 15.6.

You can also create box charts in which the control limits apply to the subgroup medians. For example, the following statements create the chart shown in Figure 15.5:

title 'Box Chart for Power Output';
proc shewhart data=Turbine;
   boxchart KWatts*Day / controlstat = median;
run;

The CONTROLSTAT=MEDIAN option requests control limits that apply to the medians. Alternatively, you can specify the NOLIMITS option to suppress the display of control limits and create ordinary side-by-side box-and-whisker plots. See Example 15.2.

Options such as CONTROLSTAT= and NOLIMITS are specified after the slash (/) in the BOXCHART statement. A complete list of options is presented in the section Syntax: BOXCHART Statement.

Figure 15.5 Box Chart for Power Output Data (Traditional Graphics)
Box Chart for Power Output Data (Traditional Graphics)