SUMMARY Statement
computes summary statistics for SAS data sets
- SUMMARY <CLASS operand>
<VAR operand>
<WEIGHT operand>
- <STAT operand>
<OPT operand>
<WHERE(expression)>;
where the
operands used by most clauses
take either a matrix name, a matrix literal, or
an expression yielding a matrix name or value.
A discussion of the clauses and
operands follows.
The SUMMARY statement computes statistics for numeric variables
for an entire data set or a subset of observations in the data set.
The statistics can be stratified by the use of CLASS variables.
The computed statistics are displayed in tabular
form and optionally can be saved in matrices.
Like most other IML data processing statements, the
SUMMARY statement works on the current data set.
The following options are available with the SUMMARY statement:
- CLASS operand
- specifies the variables in the current input
SAS data set to be used to group the summaries.
The operand is a character matrix containing
the names of the variables, for example:
summary class { age sex} ;
Both numeric and character variables
can be used as CLASS variables.
- VAR operand
- calculates statistics for a set of numeric
variables from the current input data set.
The operand is a character matrix
containing the names of the variables.
Also, the special keyword _NUM_ can be used as
a VAR operand to specify all numeric variables.
If the VAR clause is missing, the SUMMARY statement produces
only the number of observations in each class group.
- WEIGHT operand
- specifies a character value containing the name of
a numeric variable in the current data set whose
values are to be used to weight each observation.
Only one variable can be specified.
- STAT operand
- computes the statistics specified.
The operand is a character matrix
containing the names of statistics.
For example, to get the mean and standard deviation, specify
the following:
summary stat{mean std};
The following list describes the keywords that can
be specified as the STAT operand:
- CSS
- computes the corrected sum of squares.
- MAX
- computes the maximum value.
- MEAN
- computes the mean.
- MIN
- computes the minimum value.
- N
- computes the number of observations in the
subgroup used in the computation of the various
statistics for the corresponding analysis variable.
- NMISS
- computes the number of observations in the subgroup
having missing values for the analysis variable.
- STD
- computes the standard deviation.
- SUM
- computes the sum.
- SUMWGT
- computes the sum of the WEIGHT variable values if WEIGHT
is specified; otherwise, IML computes the number of
observations used in the computation of statistics.
- USS
- computes the uncorrected sum of squares.
- VAR
- computes the variance.
When the STAT clause is omitted, the SUMMARY statement
computes these statistics for each variable in the VAR clause:
Note that NOBS, the number of observations
in each CLASS group, is always given.
- OPT operand
- sets the PRINT or NOPRINT and SAVE or NOSAVE options.
The NOPRINT option suppresses the printing
of the results from the SUMMARY statement.
The SAVE option requests that the SUMMARY statement
save the resultant statistics in matrices.
The operand is a character matrix
containing one or more of the options.
When the SAVE option is set, the SUMMARY statement creates a
CLASS vector for each CLASS variable, a statistic matrix for
each analysis variable, and a column vector named _NOBS_.
The CLASS vectors are named by the corresponding
CLASS variable and have an equal number of rows.
There are as many rows as there are subgroups
defined by the interaction of all CLASS variables.
The statistic matrices are named by
the corresponding analysis variable.
Each column of the statistic matrix corresponds to a
statistic requested, and each row corresponds to the
statistics of the subgroup defined by the CLASS variables.
If no CLASS variable has been specified, each statistic matrix
has one row, containing the statistics of the entire population.
The _NOBS_ vector contains the number
of observations for each subgroup.
The default is PRINT NOSAVE.
- WHERE expression
- conditionally selects observations, within the range
specification, according to conditions given in expression.
The general form of the WHERE clause is
- WHERE( variable comparison-op operand)
In the preceding statement,
- variable
- is a variable in the SAS data set.
- comparison-op
- is one of the following comparison operators:
- <
- less than
- <=
- less than or equal to
- =
- equal to
- >
- greater than
- >=
- greater than or equal to
- ^=
- not equal to
- ?
- contains a given string
- ^?
- does not contain a given string
- =:
- begins with a given string
- =*
- sounds like or is spelled like a given string
- operand
- is a literal value, a matrix name, or an expression in parentheses.
WHERE comparison arguments can be matrices.
For the following operators, the WHERE clause succeeds if
all the elements in the matrix satisfy the condition:
For the following operators, the WHERE clause succeeds if
any of the elements in the matrix satisfy the condition:
Logical expressions can be specified within the WHERE
clause, by using the AND (&) and OR (|) operators.
The general form is
| clause&clause | (for an AND clause) |
| clause|clause | (for an OR clause) |
where clause can be a comparison, a
parenthesized clause, or a logical expression
clause that is evaluated by using operator precedence.
Note: The expression on the left-hand side refers
to values of the data set variables, and the expression
on the right-hand side refers to matrix values.
See Chapter 6 for an example that uses the SUMMARY statement.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.