The SUMMARY statement is used to calculate descriptive statistics such as the sample mean, sample variance, number of observations, sum of squares, and so on. If you specify one or more variables in the GROUPBY= option, the results are produced separately for each combination of the GROUPBY variables.
Examples: | Partitioning a Table into a Temporary Table |
specifies one or more numeric variables. If you do not specify this option, then all numeric variables in the table are used.
lists the aggregator on which the ordering of the result set is based.
CSS | corrected sum of squares |
CV | coefficient of variation |
MAX | maximum value |
MEAN | arithmetic mean |
MIN | minimum value |
N | number of observations |
NMISS | number of missing observations |
PROBT | p-value for the t-statistic |
STD | standard deviation |
STDERR | standard error |
SUM | sum of the nonmissing values |
TSTAT | t-statistic for the null hypothesis that the mean equals zero |
USS | uncorrected sum of squares |
VAR | sample variance |
Alias | AGG= |
specifies that the levels of the GROUPBY variables are to be arranged in descending order.
Alias | DESC |
specifies the formats for the GROUPBY= variables. If you do not specify the FORMATS= option, or if you omit the entry for a GROUPBY variable, the default format is applied for that variable.
Example | proc imstat data=lasr1.table1;
summary x*y / groupby=(a b) formats=("8.3", "$10");
quit;
|
specifies a list of variable names, or a single variable name, to use as GROUPBY variables in the order of the grouping hierarchy. If you do not specify any GROUPBY variable names, then the calculation is performed across the entire table—possibly subject to a WHERE clause.
specifies the maximum number of levels in a GROUPBY set. When the software determines that there are at least n levels in the GROUPBY set, it abandons the action, returns a message, and does not produce a result set. You can specify the GROUPBYLIMIT= option if you want to avoid creating excessively large result sets in GROUPBY operations.
specifies a section of the group-by hierarchy to be included in the computation. With this option, you can request that the server performs the analysis for only a subset of all possible groupings. The subset is determined by applying the group filter to a temporary table that you generate with the GROUPBY statement.
specifies the top section or the bottom section of the groupings to be collected. If the DESCENDING option is specified, the top LIMIT=n (where n > 0) groupings are collected. Otherwise, the bottom LIMIT=n groupings are collected.
Alias | DESC |
specifies the maximum number of distinct groupings to be collected, where integer n >= 0. If n is zero, then all distinct groupings (up to 231–1) that satisfy the boundary constraints, such as LOWERSCORE=f, are collected.
CAUTION: |
High Cardinality
Data Sets
Setting n to
zero with high-cardinality data sets can significantly delay the response
of the server.
|
specifies the exclusive lower bound for the numeric scores of the distinct groupings to collect.
Alias | SGT= |
specifies the exclusive upper bound for the numeric scores of the distinct groupings to collect.
Alias | SLT= |
specifies the exclusive lower bound of the group-by variable’s formatted values for the distinct groupings to collect.
Alias | VGT= |
specifies the exclusive upper bound of the group-by variable’s formatted values for the distinct groupings to collect.
Alias | VLT= |
specifies the in-memory table from which to load the group-by hierarchy. If the TABLE= option is not specified, then all other GROUPFILTER= options are ignored.
proc imstat; table example.cars_program_all; groupby state city trade_in_model / temptable weight=new_vehicle_msrp agg =(max) order =weight; run;
table example.cars_program_all;
distinct sales_type / groupfilter=(
table =mylasr.&_TEMPLAST_
scoregt=40000
valuelt=("FL","Ft Myers","")
limit =20
descending);
run;
Interaction | If you specify the GROUPFILTER= option, then the GROUPBY= and FORMATS= options have no effect. |
limits the size of the result set returned to the SAS client. For example, the following SUMMARY statement returns the size (in number of records) of the largest partition for Table1.
Example | proc imstat data=mylasr.Table1;
summary Amount / partition orderby=(Amount) desc
aggregate=(N) limit=1;
run; |
specifies the number of bins to create when a numeric GROUPBY variable exceeds the MERGELIMIT=n specification. If you specify a MERGELIMIT, but do not specify a value for the MERGEBINS= option, the server automatically calculates the number of bins.
specifies that when the number of unique values in a numeric GROUPBY variable exceeds n, the variable is automatically binned and the GROUPBY structure is determined based on the binned values of the variable, rather than the unique formatted values.
specifies that the levels of the GROUPBY variables are to be arranged in descending order.
Alias | DESC |
prevents the procedure from preparsing and pregenerating code for temporary expressions, scoring programs, and other user-written SAS statements.
Alias | NOPREP |
specifies that the temporary table generated by the TEMPTABLE option is not partitioned by the GROUPBY= variables. When you request a temporary table with the SUMMARY statement, by default, the server partitions the table and the size of a partition is equal to the number of analysis variables in the variable-list of the SUMMARY statement. When the number of groups is large, this can result in many small partitions, and requires extra memory resources to store the partition information for the temporary table. By specifying this option, the temporary table is organized similarly to the default table, but is not a partitioned table.
Alias | NOTP |
specifies the variables to use for ordering the result set. If a variable is not one of the numeric variables in the variable-list for the SUMMARY statement, it is assumed to be one of the GROUPBY variables.
specifies the sort order for the result set. The default is ascending order. Specify the ORDERBYDESC option to sort in descending order. Note that this option is different from setting the DESCENDING option. The DESCENDING option affects the order of the values for the GROUPBY variables.
When you specify this option and the table is partitioned, the results are calculated separately for each value of the partition key. In other words, the partition variables function as automatic GROUPBY variables. This mode of executing calculations by partition is more efficient than using the GROUPBY= option. With a partitioned table, the server takes advantage of knowing that observations for a partition cannot be located on more than one worker node.
statement / partition="F 11"; /* passed directly to the server */ statement / partition="F","11"; /* composed by the procedure */
Alias | PART= |
specifies that the ordering of the GROUPBY variables is based on the raw values of the variables instead of the formatted values.
saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.
requests that the server estimate the size of the result set. The procedure does not create a result table if the SETSIZE option is specified. Instead, the procedure reports the number of rows that are returned by the request and the expected memory consumption for the result set (in KB). If you specify the SETSIZE option, the SAS log includes the number of observations and the estimated result set size. See the following log sample:
NOTE: The LASR Analytic Server action request for the STATEMENT
statement would return 17 rows and approximately
3.641 kBytes of data.
specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.
Alias | TE= |
specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.
Alias | TN= |
generates an in-memory temporary table from the result set. The IMSTAT procedure displays the name of the table and stores it in the &_TEMPLAST_ macro variable, provided that the statement executed successfully.
specifies to augment the summary results with a 12-bin histogram. This option has no effect when the summaries are computed in GROUPBY or partitioned mode.
ODS Table Name
|
Description
|
Option
|
---|---|---|
Summary
|
Descriptive Statistics
|
Default, when TEMPTABLE
is not specified
|
TempTable
|
Information about a
temporary table
|
TEMPTABLE
|