# The BOXPLOT Procedure

### Example 28.2 Using Box Plots to Compare Groups

In this example a box plot is used to compare the delay times of airline flights during the Christmas holidays with the delay times prior to the holiday period. The following statements create a data set named `Times` with the delay times in minutes for 25 flights each day. When a flight is canceled, the delay is recorded as a missing value.

```data Times;
informat Day date7. ;
format   Day date7. ;
input Day @ ;
do Flight=1 to 25;
input Delay @ ;
output;
end;
datalines;
16DEC88   4  12   2   2  18   5   6  21   0   0
0  14   3   .   2   3   5   0   6  19
7   4   9   5  10
17DEC88   1  10   3   3   0   1   5   0   .   .
1   5   7   1   7   2   2  16   2   1
3   1  31   5   0
18DEC88   7   8   4   2   3   2   7   6  11   3
2   7   0   1  10   2   3  12   8   6
2   7   2   4   5
19DEC88  15   6   9   0  15   7   1   1   0   2
5   6   5  14   7  20   8   1  14   3
10   0   1  11   7
20DEC88   2   1   0   4   4   6   2   2   1   4
1  11   .   1   0   6   5   5   4   2
2   6   6   4   0
21DEC88   2   6   6   2   7   7   5   2   5   0
9   2   4   2   5   1   4   7   5   6
5   0   4  36  28
22DEC88   3   7  22   1  11  11  39  46   7  33
19  21   1   3  43  23   9   0  17  35
50   0   2   1   0
23DEC88   6  11   8  35  36  19  21   .   .   4
6  63  35   3  12  34   9   0  46   0
0  36   3   0  14
24DEC88  13   2  10   4   5  22  21  44  66  13
8   3   4  27   2  12  17  22  19  36
9  72   2   4   4
25DEC88   4  33  35   0  11  11  10  28  34   3
24   6  17   0   8   5   7  19   9   7
21  17  17   2   6
26DEC88   3   8   8   2   7   7   8   2   5   9
2   8   2  10  16   9   5  14  15   1
12   2   2  14  18
;
```

In the following statements, the MEANS procedure is used to count the number of canceled flights for each day. This information is then added to the data set `Times`.

```proc means data=Times noprint;
var Delay;
by Day;
output out=Cancel nmiss=ncancel;
run;

data Times;
merge Times Cancel;
by Day;
run;
```

The following statements create a data set named `Weather` containing information about possible causes for delays, and then merge this data set with the data set `Times`:

```data Weather;
informat Day date7. ;
format   Day date7. ;
length Reason \$ 16 ;
input Day Flight Reason & ;
datalines;
16DEC88  8   Fog
17DEC88  18  Snow Storm
17DEC88  23  Sleet
21DEC88  24  Rain
21DEC88  25  Rain
22DEC88  7   Mechanical
22DEC88  15  Late Arrival
24DEC88  9   Late Arrival
24DEC88  22  Late Arrival
;

data Times;
merge Times Weather;
by Day Flight;
run;
```

The following statements create a box plot for the complete set of data:

```ods graphics off;
symbol1 value=dot            c=salmon h=2.0 pct;
symbol2 value=squarefilled   c=vigb   h=2.0 pct;
symbol3 value=trianglefilled c=vig    h=2.0 pct;
title 'Box Plot for Airline Delays';
proc boxplot data=Times;
plot Delay*Day = ncancel /
nohlabel
symbollegend = legend1;
legend1 label = ('Cancellations:');
label Delay = 'Delay in Minutes';
run;
goptions reset=symbol;
```

The level of the symbol variable `ncancel` determines the symbol marker for each group mean, and the SYMBOLLEGEND= option controls the appearance of the legend for the symbols. The NOHLABEL option suppresses the horizontal axis label. The resulting box plot is shown in Output 28.2.1.

Output 28.2.1: Box Plot for Airline Data

The delay distributions from December 22 through December 25 are drastically different from the delay distributions during the pre-holiday period. Both the mean delay and the variability of the delays are much greater during the holiday period.