IMSTAT Procedure (Analytics)

TOPK Statement

The TOPK statement calculates and selects the top-k and bottom-k distinct values of a variable based on a user-specified ranking order. The distinct values can be reported as raw or formatted values. The ranking can be based on the raw value, the formatted value, the frequency count, or based on a calculated score derived from the values of a weight variable. You can also specify aggregate functions to roll up multiple weight values into a single score for a distinct value.

Syntax

TOPK <variable-list> </ options>;

Optional Argument

variable-list

specifies one or more numeric variables. If you do not specify this option, then all numeric variables in the table are used.

Topk Statement Options

AGGREGATE=(aggregation-methods)

specifies the aggregation methods for which WEIGHT= variable values are rolled up into rank order score for distinct values. If no WEIGHT= variable is specified, then this option is ignored.

The available aggregation methods are as follows:
MAX specifies to use the maximum value of the weight values
MEAN specifies to use the arithmetic mean of the weight values
MIN specifies to use the minimum value of the weight values
SUM specifies to use the sum of the weight values
Alias AGG=
Default SUM

FORMATS=("format-specification",...)

specifies the formats for the variables. If you do not specify the FORMATS= option, or if you omit the entry for a variable, the default format is applied for that variable.

Enclose each format specification in quotation marks and separate each format specification with a comma.
Example
proc imstat data=lasr1.table1;
   topk x1 x2 / formats=("10.2", 10.2");
quit;

FREQ=variable-name

specifies the numeric frequency variable to use for calculating the rank order score for distinct values. This option is valid when ORDER=FREQ or when AGGREGATE= is N, SUM, or MEAN only.

K1=n

specifies the maximum number of distinct values to include in the top-k list.

Alias TOPK=
Default 1
Range 1 to 1000

K2=n

specifies the maximum number of distinct values to include in the bottom-k list.

Alias BOTTOMK=
Default 1
Range 1 to 1000

DESCENDING

specifies that the levels of the GROUPBY variables are to be arranged in descending order.

Alias DESC

ORDER= FREQ | VALUE | WEIGHT

specifies the rank ordering to apply to the distinct values when no WEIGHT= variable is specified. The following rank orders are valid in the TOPK request.

The available ordering methods are as follows:
FREQ specifies to order by frequency count
VALUE specifies to order by raw or formatted values of the variable
WEIGHT specifies to order by the aggregate values of the WEIGHT= variable
Default FREQ

WEIGHT=variable-name

specifies the numeric weight variable to use for calculating the rank order score. If you specify ORDER= and WEIGHT=, then the WEIGHT= variable takes priority over ORDER.

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

TEMPEXPRESS="SAS-expressions"

TEMPEXPRESS=file-reference

specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.

Alias TE=

TEMPNAMES=variable-name

TEMPNAMES=(variable-list)

specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.

Alias TN=

Details

ODS Table Names

The TOPK statement generates the following ODS tables for each analysis variable.
ODS Table Name
Description
Option
TOPK
Top/Bottom K Distinct Values
Default
BTMK
Top/Bottom K Distinct Values
Default
TOPKMISC
Misc. Info for Top/Bottom K Distinct Values
Default
For information about using the ODS table with SAVE= option, see the Details section of the STORE statement.