Working with SAS Data Sets


Produce Summary Statistics

You can use the SUMMARY statement to compute summary statistics of numeric variables in a SAS data set. The statistics can be computed for subgroups of the data by using the CLASS clause. The SAVE option in the OPT clause enables you to save the computed statistics in matrices. For example, consider the following statements:

proc iml;
use Sashelp.class;
summary class {sex} var {height weight};

Figure 7.21: Summary Statistics

      Sex  Nobs  Variable         MIN         MAX        MEAN         STD       
      -------------------------------------------------------------------       
      F       9  Height      51.30000    66.50000    60.58889     5.01833       
                 Weight      50.50000   112.50000    90.11111    19.38391       
                                                                                
      M      10  Height      57.30000    72.00000    63.91000     4.93794       
                 Weight      83.00000   150.00000   108.95000    22.72719       
                                                                                
      All    19  Height      51.30000    72.00000    62.33684     5.12708       
                 Weight      50.50000   150.00000   100.02632    22.77393       
      -------------------------------------------------------------------       
                                                                                



As shown in Figure 7.21, the default statistics are the minimum, maximum, mean, and standard deviation of each variable specified in the VAR clause. The default behavior is to display the summary statistics in a table.

You can also store the summary statistics in SAS/IML matrices. For example, the following statement creates four matrices: Sex, _OBS_, Height, and Weight:

summary class {sex} var {height weight}
        stat {mean std var} opt {noprint save};

Because the SAVE option was specified, the statistics of the variables are stored in matrices under the name of the corresponding variables: each column corresponds to a statistic, and each row corresponds to a subgroup. Two other vectors, Sex and _NOBS_, are also created. The vector Sex contains the two distinct values of the CLASS variable. The vector _NOBS_ contains the number of observations in each subgroup. The following statements display the SAS/IML matrices that are defined and print the height and weight matrices:

show names;
/* print matrices that show the stats */
print height[r=sex c={"Mean" "Std" "Var"}],
      weight[r=sex c={"Mean" "Std" "Var"}];

Figure 7.22: Summary Statistics

 SYMBOL   ROWS   COLS TYPE   SIZE                                               
 ------ ------ ------ ---- ------                                               
 Height      2      3 num       8                                               
 Sex         2      1 char      1                                               
 Weight      2      3 num       8                                               
 _NOBS_      2      1 num       8                                               
  Number of symbols = 6  (includes those without values)                        
                                                                                

Height
  Mean Std Var
F 60.588889 5.0183275 25.183611
M 63.91 4.937937 24.383222

Weight
  Mean Std Var
F 90.111111 19.383914 375.73611
M 108.95 22.727186 516.525



You can specify more than one CLASS variable, in which case subgroups are defined by the joint combinations of the values of the CLASS variables.