MEANS Procedure

CLASS Statement

Specifies the variables whose values define the subgroup combinations for the analysis.
Tips: You can use multiple CLASS statements.

Some CLASS statement options are also available in the PROC MEANS statement. They affect all CLASS variables. Options that you specify in a CLASS statement apply only to the variables in that CLASS statement.

See: For information about how the CLASS statement groups formatted values, see Formatted Values.
Computing Descriptive Statistics with Class Variables

Using the BY Statement with Class Variables

Using a CLASSDATA= Data Set with Class Variables

Using Multilabel Value Formats with Class Variables

Using Preloaded Formats with Class Variables

Computing a Confidence Limit for the Mean

Computing Output Statistics

Computing Different Output Statistics for Several Variables

Computing Output Statistics with Missing Class Variable Values

Identifying an Extreme Value with the Output Statistics

Identifying the Top Three Extreme Values with the Output Statistics

Syntax

CLASS variable(s) </ options>;

Required Argument

variable(s)
specifies one or more variables that the procedure uses to group the data. Variables in a CLASS statement are referred to as class variables. Class variables are numeric or character. Class variables can have continuous values, but they typically have a few discrete values that define levels of the variable. You do not have to sort the data by class variables.
Interaction:Use the TYPES statement or the WAYS statement to control which class variables PROC MEANS uses to group the data.
Tip:To reduce the number of class variable levels, use a FORMAT statement to combine variable values. When a format combines several internal values into one formatted value, PROC MEANS outputs the lowest internal value.

Optional Arguments

ASCENDING
specifies to sort the class variable levels in ascending order.
Alias:ASCEND
Interaction:PROC MEANS issues a warning message if you specify both ASCENDING and DESCENDING and ignores both options.
DESCENDING
specifies to sort the class variable levels in descending order.
Alias:DESCEND
Interaction:PROC MEANS issues a warning message if you specify both ASCENDING and DESCENDING and ignores both options.
EXCLUSIVE
excludes from the analysis all combinations of the class variables that are not found in the preloaded range of user-defined formats.
Requirement:You must specify PRELOADFMT to preload the class variable formats.
GROUPINTERNAL
specifies not to apply formats to the class variables when PROC MEANS groups the values to create combinations of class variables.
Interactions:If you specify the PRELOADFMT option, then PROC MEANS ignores the GROUPINTERNAL option and uses the formatted values.

If you specify the ORDER=FORMATTED option, then PROC MEANS ignores the GROUPINTERNAL option and uses the formatted values.

Tip:This option saves computer resources when the numeric class variables contain discrete values.
MISSING
considers missing values as valid values for the class variable levels. Special missing values that represent numeric values (the letters A through Z and the underscore (_) character) are each considered as a separate value.
Default:If you omit MISSING, then PROC MEANS excludes the observations with a missing class variable value from the analysis.
See:SAS Language Reference: Concepts for a discussion of missing values with special meanings.

Computing Output Statistics with Missing Class Variable Values

MLF
enables PROC MEANS to use the primary and secondary format labels for a given range or overlapping ranges to create subgroup combinations when a multilabel format is assigned to a class variable.
Requirement:You must use PROC FORMAT and the MULTILABEL option in the VALUE statement to create a multilabel format.
Interactions:If you use the OUTPUT statement with MLF, then the class variable contains a character string that corresponds to the formatted value. Because the formatted value becomes the internal value, the length of this variable is the number of characters in the longest format label.

Using MLF with ORDER=FREQ might not produce the order that you expect for the formatted values. You might not get the expected results when you used MLF with CLASSDATA and EXCLUSIVE because MLF processing requires that each TYPE be computed independently. Types other than NWAY might contain more levels than expected.

Note:When the formatted values overlap, one internal class variable value maps to more than one class variable subgroup combination. Therefore, the sum of the N statistics for all subgroups is greater than the number of observations in the data set (the overall N statistic).
Tip:If you omit MLF, then PROC MEANS uses the primary format labels. This action corresponds to using the first external format value to determine the subgroup combinations.
See:The MULTILABEL option in the VALUE statement of the FORMAT procedure Optional Arguments.

Using Multilabel Value Formats with Class Variables

ORDER=DATA | FORMATTED | FREQ | UNFORMATTED
specifies the order to group the levels of the class variables in the output, where
DATA
orders values according to their order in the input data set.
Interaction:If you use PRELOADFMT, then the order of the values of each class variable matches the order that PROC FORMAT uses to store the values of the associated user-defined format. If you use the CLASSDATA= option in the PROC statement, then PROC MEANS uses the order of the unique values of each class variable in the CLASSDATA= data set to order the output levels. If you use both options, then PROC MEANS first uses the user-defined formats to order the output. If you omit EXCLUSIVE in the PROC statement, then PROC MEANS appends after the user-defined format and the CLASSDATA= values the unique values of the class variables in the input data set based on the order in which they are encountered.
Tip:By default, PROC FORMAT stores a format definition in sorted order. Use the NOTSORTED option to store the values or ranges of a user-defined format in the order in which you define them.
FORMATTED
orders values by their ascending formatted values. This order depends on your operating environment. If no format has been assigned to a class variable, then the default format, BEST12., is used.
FREQ
orders values by descending frequency count so that levels with the most observations are listed first.
Interactions:For multiway combinations of the class variables, PROC MEANS determines the order of a level from the individual class variable frequencies.

Use the ASCENDING option to order values by ascending frequency count.

UNFORMATTED
orders values by their unformatted values, which yields the same order as PROC SORT. This order depends on your operating environment. This sort sequence is particularly useful for displaying dates chronologically.
Alias:UNFMT | INTERNAL
Default:UNFORMATTED
Tip:By default, all orders except FREQ are ascending. For descending orders, use the DESCENDING option.
PRELOADFMT
specifies that all formats are preloaded for the class variables.
Requirement:PRELOADFMT has no effect unless you specify either COMPLETETYPES, EXCLUSIVE, or ORDER=DATA and you assign formats to the class variables.
Interactions:To limit PROC MEANS output to the combinations of formatted class variable values present in the input data set, use the EXCLUSIVE option in the CLASS statement.

To include all ranges and values of the user-defined formats in the output, even when the frequency is zero, use COMPLETETYPES in the PROC statement.

Details

Comparison of the BY and CLASS Statements

Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.
When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.
You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups. See Using the BY Statement with Class Variables.

How PROC MEANS Handles Missing Values for Class Variables

By default, if an observation contains a missing value for any class variable, then PROC MEANS excludes that observation from the analysis. If you specify the MISSING option in the PROC statement, then the procedure considers missing values as valid levels for the combination of class variables.
Specifying the MISSING option in the CLASS statement enables you to control the acceptance of missing values for individual class variables.

Computer Resources

The total of unique class values that PROC MEANS allows depends on the amount of computer memory that is available. See Computational Resources for more information.
The GROUPINTERNAL option can improve computer performance because the grouping process is based on the internal values of the class variables. If a numeric class variable is not assigned a format and you do not specify GROUPINTERNAL, then PROC MEANS uses the default format, BEST12., to format numeric values as character strings. Then PROC MEANS groups these numeric variables by their character values, which takes additional time and computer memory.