Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The BOXPLOT Procedure

Example 1.1: Putting Overlays on Box Plots

The following statements create two data sets. The first, called Scores, contains student scores for each of four exams given during the spring 2000 semester. The History data set contains average scores made on the exams by classes in the two previous spring semesters. The History data set is merged with Scores to produce an input data set for the BOXPLOT procedure.

   proc format;
      value examfmt
      1='Midterm 1'
      2='Midterm 2'
      3='Midterm 3'
      4='Final'
      ;

   data Scores;
      input exam @;
      do i=1 to 18;
         input score @;
         output;
         end;
      drop i;
      datalines;
   1 78 92 83 81 65 90 72 80 63 88 85 79 94 71 76 84 77 82
   2 82 88 94 90 73 79 75 84 68 82 86 73 89 75 77 98 81 86
   3 88 85 95 91 76 77 78 88 80 76 83 84 91 75 83 88 85 85
   4 85 90 91 93 68 82 82 97 77 83 85 89 95 77 84 90 89 89
   ;
   run;

   data History;
      label avg98 = '1998';
      label avg99 = '1999';
      input exam avg98 avg99;
      datalines;
   1  78.4 83.6
   2  84.1 80.8
   3  82.3 85.2
   4  80.1 88.3
   ;
   run;

   data Scores;
     merge scores history;
     by exam;
   run;

You can create a box plot of scores on the four exams from 2000, overlaid with averages from the two previous years. The following statements produce the box plot with overlays shown in Output 1.1.1.

   symbol1 v=dot c=salmon;
   title 'Spring 2000 Exam Scores';
   proc boxplot data=Scores;
      plot score * exam /
         cframe        = vligb
         cboxes        = dagr
         cboxfill      = ywh
         overlay       = (avg98 avg99)
         coverlay      = (vigb vig)
         overlaysym    = (square circle)
         ccoverlay     = (none none)
         overlayleglab = 'Past Averages:'
         nohlabel
         ;
      format exam examfmt.;
   run;

Output 1.1.1: Box Plot of Exam Scores
boxover.gif (4849 bytes)

The OVERLAY= option specifies the variables to plot as overlays on the box plot. The COVERLAY= and OVERLAYSYM= options select the colors and shapes of the symbols used to plot the overlays. CCOVERLAY= specifies the colors of the line segments used to connect the overlay points. In this case the keyword NONE is specified for each overlay, suppressing the connecting lines.

The order in which variables appear in the OVERLAY= list determines the values from related option lists that are used to plot the overlays. The variable avg98 appears first in the OVERLAY= list, so it is plotted using the first values from the related lists: COVERLAY= VIGB, OVERLAYSYM= SQUARE, and CCOVERLAY= NONE. The values associated with the second overlay variable, avg99, are VIG, CIRCLE, and NONE.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.