IMSTAT Procedure (Analytics)

CROSSTAB Statement

The CROSSTAB statement is used to calculate one and two dimensional tables. You can use GROUPBY= variables, partitioned tables, a WEIGHT= variable to calculate multiple statistics for each table cell, and calculate marginal versions of the statistics as well.

Example:

Crosstabulation with Measures of Association and Chi-Square Tests

Syntax

Required Argument

Optional Argument

CROSSTAB Statement Options

Details

ODS Table Names

Syntax

CROSSTAB row-variable </ options>;

CROSSTAB row*column </ options>;

Required Argument

row-variable

specifies to create a one-dimensional table using the specified variable.

Optional Argument

row

column

specifies to create a two-dimensional table based on the two variables.

CROSSTAB Statement Options

ACROSSBY

specifies that the levels of row and column variables are the same across the GROUPBY= variables. If you specify this option, then the tables in each group are shown with the same row and column layout. If the ACROSSBY option is not specified, then the particular row and column levels might be different among the groups based on which values of the row and column variables occur in each group.

Alias

ACROSS

AGGREGATE=(statistic(s))

specifies the statistics to use as aggregation methods for which crosstabulations are computed when a WEIGHT variable is also specified. If no WEIGHT variable is specified, then the N aggregator is applied. In other words, the crosstabulation shows the frequency with which the values occur when no WEIGHT variable is specified. If you specify multiple aggregation methods, then the server computes a crosstabulation for each method.

The available aggregation methods are as follows:

CSS	corrected sum of squares
CV	coefficient of variation
MAX	maximum value
MEAN	arithmetic mean
MIN	minimum value
N	number of observations
PROBT	p-value for the t-statistic
STD	standard deviation
STDERR	standard error
SUM	sum of the nonmissing values
TSTAT	t-statistic for the null hypothesis that the mean equals zero
USS	uncorrected sum of squares
VAR	sample variance

Alias

AGG=

ASSOCIATION

specifies to calculate the measures of association between the row and column variable of the cross tabulation. The option generates the following measures: the Gamma statistic, Kendall's Tau-b, Stuart's Tau-c, Somers' measures, Lambda measures, and uncertainty measures.

Alias

MEASURES

CHISQ

computes Chi-square statistics for the test of independence of the row and column variables and their asymptotic p-values. The option calculates the Pearson Chi-square statistic as well as the likelihood-ratio test.

COLBINS=b

specifies the number of bins to use for binning the column variable. If the values in the data are such that there are no contributing observations for a bin, then the server does not return the bin. As a result, the server can return fewer bins than you specified.

Alias

XBINS=

DESCENDING

specifies that the levels of the GROUPBY variables are to be arranged in descending order.

Alias

DESC

FORMATS=("format-specification",...)

specifies the formats for the GROUPBY= variables. If you do not specify the FORMATS= option, or if you omit the entry for a GROUPBY variable, the default format is applied for that variable.

Enclose each format specification in quotation marks and separate each format specification with a comma.

Example

proc imstat data=lasr1.table1;
   crosstab x*y / groupby=(a b) formats=("8.3", "$10");
quit;

GROUPBY=(variable-list)

specifies a list of variable names, or a single variable name, to use as GROUPBY variables in the order of the grouping hierarchy. If you do not specify any GROUPBY variable names, then the calculation is performed across the entire table—possibly subject to a WHERE clause.

GROUPBYLIMIT=n

specifies the maximum number of levels in a GROUPBY set. When the software determines that there are at least n levels in the GROUPBY set, it abandons the action, returns a message, and does not produce a result set. You can specify the GROUPBYLIMIT= option if you want to avoid creating excessively large result sets in GROUPBY operations.

MARGINS= ROW | COL | ALL | NONE

specifies whether to calculate marginal values in addition to the crosstabulation. The default is MARGINS=NONE, which specifies that no marginal values are calculated. The MARGINS=ROW setting calculates margins for the row variable only. The MARGINS=COL setting calculates margins for the column variable only. The MARGINS=ALL setting calculates row margins, column margins, and the overall margin.

These calculations are repeated for each aggregate method in the CROSSTAB request.

Default

NONE

MERGEBINS=b

specifies the number of bins to create when a numeric GROUPBY variable exceeds the MERGELIMIT=n specification. If you specify a MERGELIMIT, but do not specify a value for the MERGEBINS= option, the server automatically calculates the number of bins.

MERGELIMIT=n

specifies that when the number of unique values in a numeric GROUPBY variable exceeds n, the variable is automatically binned and the GROUPBY structure is determined based on the binned values of the variable, rather than the unique formatted values.

For example, if you specify MERGELIMIT=500, any numeric GROUPBY variable with more than 500 unique formatted values is binned. Instead of returning results for more than 500 groups, the results are returned for the bins. You can specify the number of bins with the MERGEBINS= option.

NOEMPTY

specifies that empty cells are not returned to the SAS session (only full cells are returned). When this option is specified, the server arranges the results into a vector of nonzero values with row and column indices. This sparse storage is memory efficient when the table has many empty cells.

Alias

FULLCELL

DESCENDING

specifies that the levels of the GROUPBY variables are to be arranged in descending order.

Alias

DESC

NOPREPARSE

prevents the procedure from preparsing and pregenerating code for temporary expressions, scoring programs, and other user-written SAS statements.

When this option is specified, the user-written statements are sent to the server "as is" and then the server attempts to generate code from it. If the server detects problems with the code, the error messages might not to be as detailed as the messages that are generated by SAS client. If you are debugging your user-written program, then you might want to preparse and pregenerate code in the procedure. However, if your SAS statements compile and run as you want them to, then you can specify this option to avoid the work of parsing and generating code on the SAS client.

When you specify this option in the PROC IMSTAT statement, the option applies to all statements that can generate code. You can also exclude specific statements from preparsing by using the NOPREPARSE option in statements that allow temporary columns or the SCORE statement.

Alias

NOPREP

NOTEMPPART

specifies that the temporary table generated by the TEMPTABLE option is not partitioned by the GROUPBY= variables. When you request a temporary table with the CROSSTAB statement, by default, the server creates a partitioned table. When the number of groups is large, this can result in many small partitions, and requires extra memory resources to store the partition information for the temporary table. By specifying this option, the temporary table is organized similarly to the default table, but is not a partitioned table.

Alias

NOTP

PARTITION <=partition-key>

When you specify this option and the table is partitioned, the results are calculated separately for each value of the partition key. In other words, the partition variables function as automatic GROUPBY variables. This mode of executing calculations by partition is more efficient than using the GROUPBY= option. With a partitioned table, the server takes advantage of knowing that observations for a partition cannot be located on more than one worker node.

If you do not specify a partition-key, the analysis is performed for all partitions. If you do specify a partition-key, the analysis is carried out for the specified key value only. You can use the PARTITIONINFO statement to retrieve the valid partition key values for a table.

You can specify a partition-key in two ways. You can supply a single quoted string that is passed to the server, or you can specify the elements of a composite key separated by commas. For example, if you partition a table by variables GENDER and AGE, with formats $1 and BEST12, respectively, then the composite partition key has a length of 13. You can specify the partition for the 11-year-old females as follows:

statement / partition="F          11"; /* passed directly to the server */
statement / partition="F","11";        /* composed by the procedure */

If you choose the second format, the procedure composes a key based on formatting information from the server.

Alias

PART=

RAWORDER

specifies that the ordering of the GROUPBY variables is based on the raw values of the variables instead of the formatted values.

ROWBINS=n

specifies the number of bins to use for binning the row variable. If the values in the data are such that there are no contributing observations for a bin, then the server does not return the bin. As a result, the server can return fewer bins than you specified.

Alias

YBINS=

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

SETSIZE

requests that the server estimate the size of the result set. The procedure does not create a result table if the SETSIZE option is specified. Instead, the procedure reports the number of rows that are returned by the request and the expected memory consumption for the result set (in KB). If you specify the SETSIZE option, the SAS log includes the number of observations and the estimated result set size. See the following log sample:

NOTE: The LASR Analytic Server action request for the STATEMENT
      statement would return 17 rows and approximately
      3.641 kBytes of data.

The typical use of the SETSIZE option is to get an estimate of the size of the result set in situations where you are unsure whether the SAS session can handle a large result set. Be aware that in order to determine the size of the result set, the server has to perform the work as if you were receiving the actual result set. Requesting the estimated size of the result set does consume resources on the server. The estimated number of KB is very close to the actual memory consumption of the result set. It might not be immediately obvious how this size relates to the displayed table, since many tables contain hidden columns. In addition, some elements of the result set might not be converted to tabular output by the procedure.

TEMPEXPRESS="SAS-expressions"

TEMPEXPRESS=file-reference

specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.

Alias

TE=

TEMPNAMES=variable-name

TEMPNAMES=(variable-list)

specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.

Alias

TN=

TEMPTABLE

generates an in-memory temporary table from the result set. The IMSTAT procedure displays the name of the table and stores it in the &_TEMPLAST_ macro variable, provided that the statement executed successfully.

When the IMSTAT procedure exits, all temporary tables created during the IMSTAT session are removed. Temporary tables are not displayed on a TABLEINFO request, unless the temporary table is the active table for the request.

Interaction

The TEMPTABLE option requires a group-by analysis or a partitioned analysis with this statement.

WEIGHT=variable-name

specifies the numeric weight variable to use for calculating the statistics in the table cell and the margins of the table. If no WEIGHT variable is specified, then the only aggregate method that is available to the CROSSTAB statement is N. In this case, then number of observations (frequency) is reported in each table cell.

Details

ODS Table Names

The CROSSTAB statement generates the following ODS table.

ODS Table Name	Description	Option
Association	Measures of association in a crosstabulation	ASSOCIATION
ChiSq	Chi-Square statistic in a crosstabulation	CHISQ
CrossTab	Crosstabulation from a LASR Analytic Server table	Default
TempTable	Information about a temporary table	TEMPTABLE

For information about using the ODS table with SAVE= option, see the Details section of the STORE statement.