SUMMARY Statement

SUMMARY <CLASS operand> <VAR operand> <WEIGHT operand> <STAT operand> <OPT operand> <WHERE(expression)> ;

The SUMMARY statement computes statistics for numeric variables for an entire data set or a subset of observations in the data set. The statistics can be stratified by the use of CLASS variables. The computed statistics are displayed in tabular form and optionally can be saved in matrices. Like most other data processing statements, the SUMMARY statement works on the current data set.

You can specify the following options:

CLASS operand

specifies the variables in the current input SAS data set to be used to group the summaries. The operand is a character matrix that contains the names of the variables. For example:

   summary Sashelp.Class {age sex} ;

Both numeric and character variables can be used as CLASS variables.

VAR operand

computes statistics for a set of numeric variables from the current input data set. The operand is a character matrix that contains the names of the variables. Also, the special keyword _NUM_ can be used as a VAR operand to specify all numeric variables. If the VAR clause is missing, the SUMMARY statement produces only the number of observations in each classification group.

WEIGHT operand

specifies a character value that contains the name of a numeric variable in the current data set whose values are to be used to weight each observation. Only one variable can be specified.

STAT operand

computes the specified statistics. The operand is a character matrix that contains the names of statistics. For example, to get the mean and standard deviation, specify the following:

   summary stat{mean std};

You can specify the following keywords as the STAT operand:

CSS

computes the corrected sum of squares.

MAX

computes the maximum value.

MEAN

computes the mean.

MIN

computes the minimum value.

N

computes the number of observations in the subgroup that are used in the computation of the various statistics for the corresponding analysis variable.

NMISS

computes the number of observations in the subgroup that have missing values for the analysis variable.

STD

computes the standard deviation.

SUM

computes the sum.

SUMWGT

computes the sum of the WEIGHT variable values if WEIGHT is specified; otherwise, computes the number of observations used in the computation of statistics.

USS

computes the uncorrected sum of squares.

VAR

computes the variance.

When the STAT clause is omitted, the SUMMARY statement computes the MIN, MEAN, MAX, and STD statistics for each variable in the VAR clause.

NOBS, the number of observations in each CLASS group, is always displayed.

OPT operand

sets the PRINT or NOPRINT and SAVE or NOSAVE options. The NOPRINT option suppresses the printing of the results from the SUMMARY statement. The SAVE option requests that the SUMMARY statement save the resultant statistics in matrices. The operand is a character matrix that contains one or more of the options.

When the SAVE option is set, the SUMMARY statement creates a CLASS vector for each CLASS variable, a statistic matrix for each analysis variable, and a column vector named _NOBS_. The CLASS vectors are named by the corresponding CLASS variable and have an equal number of rows. There are as many rows as there are subgroups defined by the interaction of all CLASS variables. The statistic matrices are named by the corresponding analysis variable. Each column of the statistic matrix corresponds to a requested statistic, and each row corresponds to the statistics of the subgroup that is defined by the CLASS variables. If no CLASS variable is specified, each matrix has one row that contains the statistics. The _NOBS_ vector contains the number of observations for each subgroup.

The default is PRINT NOSAVE.

WHERE expression

conditionally selects observations according to conditions given in expression. For details about the WHERE clause, see the section Process Data by Using the WHERE Clause.

The following example demonstrates the use of the SUMMARY statement:

proc iml;
use Sashelp.class;
summary class {sex}
        var {height weight}
        opt {noprint save};

/* print vectors that contain the stats */
print sex _NOBS_;
print height[r=sex c={Min Max Mean Std}], 
      weight[r=sex c={Min Max Mean Std}];

Figure 23.347: Summary Statistics

Sex _NOBS_
F 9
M 10

Height
  MIN MAX MEAN STD
F 51.3 66.5 60.588889 5.0183275
M 57.3 72 63.91 4.937937

Weight
  MIN MAX MEAN STD
F 50.5 112.5 90.111111 19.383914
M 83 150 108.95 22.727186


See Chapter 7 for further details.