You can use the SUMMARY statement to compute summary statistics of numeric variables in a SAS data set. The statistics can be computed for subgroups of the data by using the CLASS clause. The SAVE option in the OPT clause enables you to save the computed statistics in matrices. For example, consider the following statements:
proc iml; use Sashelp.class; summary class {sex} var {height weight};
Figure 7.21: Summary Statistics
Sex Nobs Variable MIN MAX MEAN STD ------------------------------------------------------------------- F 9 Height 51.30000 66.50000 60.58889 5.01833 Weight 50.50000 112.50000 90.11111 19.38391 M 10 Height 57.30000 72.00000 63.91000 4.93794 Weight 83.00000 150.00000 108.95000 22.72719 All 19 Height 51.30000 72.00000 62.33684 5.12708 Weight 50.50000 150.00000 100.02632 22.77393 ------------------------------------------------------------------- |
As shown in Figure 7.21, the default statistics are the minimum, maximum, mean, and standard deviation of each variable specified in the VAR clause. The default behavior is to display the summary statistics in a table.
You can also store the summary statistics in SAS/IML matrices. For example, the following statement creates four matrices:
Sex
, _OBS_
, Height
, and Weight
:
summary class {sex} var {height weight} stat {mean std var} opt {noprint save};
Because the SAVE option was specified, the statistics of the variables are stored in matrices under the name of the corresponding
variables: each column corresponds to a statistic, and each row corresponds to a subgroup. Two other vectors, Sex
and _NOBS_
, are also created. The vector Sex
contains the two distinct values of the CLASS variable. The vector _NOBS_
contains the number of observations in each subgroup. The following statements display the SAS/IML matrices that are defined
and print the height
and weight
matrices:
show names; /* print matrices that show the stats */ print height[r=sex c={"Mean" "Std" "Var"}], weight[r=sex c={"Mean" "Std" "Var"}];
Figure 7.22: Summary Statistics
You can specify more than one CLASS variable, in which case subgroups are defined by the joint combinations of the values of the CLASS variables.