IMSTAT Procedure (Analytics)

ASSESS Statement

The ASSESS statement is used to assess one model or several models. For a set of classification models, the ASSESS statement returns three types of assessments: lift-related assessments, assessments related to a receiver operating characteristic (ROC), and concordance statistics. For a set of regression models, the ASSESS statement returns the summary statistics of the response variable for each bin of the predictions after a quantile binning of the predictions.

Syntax

Required Argument

ASSESS Statement Options

Details

Syntax

ASSESS <variable-list> / Y=response-variable <options>;

Required Argument

Y=response-variable

specifies the response variable for model assessment.

Alias

RESPONSE=

ASSESS Statement Options

CUTSTEP=n

specifies a number between 0 and 1 that defines the step size in receiver operating characteristic (ROC) calculations.

Alias

STEP=

DESCENDING

specifies that the levels of the GROUPBY variables are to be arranged in descending order.

Alias

DESC

EPSILON=e

specifies the tolerance that is used in determining the convergence of the iterative algorithm for the percentile calculation.

Default

1e^-5

EVENT="quoted-strings"

specifies the formatted value of the response variable that represents the event. When this option is not specified, the ASSESS statement performs model assessment for a regression model and the response variable must be numeric.

FORMATS=("format-specification",...)

specifies the formats for the GROUPBY= variables. If you do not specify the FORMAT= option, or if you do not specify the GROUPBY= option, the default format is applied for that variable.

Enclose each format specification in quotation marks and separate each format specification with a comma.

GROUPBY=(variable-list)

specifies a list of variable names, or a single variable name, to use as GROUPBY variables in the order of the grouping hierarchy. If you do not specify any GROUPBY variable names, then the calculation is performed across the entire table—possibly subject to a WHERE clause.

GROUPBYLIMIT=n

specifies the maximum number of levels in a GROUPBY set. When the software determines that there are at least n levels in the GROUPBY set, it abandons the action, returns a message, and does not produce a result set. You can specify the GROUPBYLIMIT= option if you want to avoid creating excessively large result sets in GROUPBY operations.

GROUPFILTER=(groupfilter-options)

specifies a section of the GROUPBY= hierarchy to include in the ASSESS computation.

MAXITER=i

specifies a positive integer that determines the maximum number of iterations for the percentile algorithm.

Default

5 × the number of bins (NBINS= option).

MERGEBINS=b

specifies the number of bins to create when a numeric GROUPBY variable exceeds the MERGELIMIT=n specification. If you specify a MERGELIMIT, but do not specify a value for the MERGEBINS= option, the server automatically calculates the number of bins.

MERGELIMIT=n

specifies that when the number of unique values in a numeric GROUPBY variable exceeds n, the variable is automatically binned and the GROUPBY structure is determined based on the binned values of the variable, rather than the unique formatted values.

For example, if you specify MERGELIMIT=500, any numeric GROUPBY variable with more than 500 unique formatted values is binned. Instead of returning results for more than 500 groups, the results are returned for the bins. You can specify the number of bins with the MERGEBINS= option.

NBINS=n

specifies the number of bins to use in the lift calculations.

NOMISSING

specifies that you do not want to include missing values in the determination of Group-By values. By default, levels with missing values are included.

PARTITION <=partition-key>

When you specify this option and the table is partitioned, the results are calculated separately for each value of the partition key. In other words, the partition variables function as automatic GROUPBY variables. This mode of executing calculations by partition is more efficient than using the GROUPBY= option. With a partitioned table, the server takes advantage of knowing that observations for a partition cannot be located on more than one worker node.

If you do not specify a partition-key, the analysis is performed for all partitions. If you do specify a partition-key, the analysis is carried out for the specified key value only. You can use the PARTITIONINFO statement to retrieve the valid partition key values for a table.

You can specify a partition-key in two ways. You can supply a single quoted string that is passed to the server, or you can specify the elements of a composite key separated by commas. For example, if you partition a table by variables GENDER and AGE, with formats $1 and BEST12, respectively, then the composite partition key has a length of 13. You can specify the partition for the 11-year-old females as follows:

statement / partition="F          11"; /* passed directly to the server */
statement / partition="F","11";        /* composed by the procedure */

If you choose the second format, the procedure composes a key based on formatting information from the server.

Alias

PART=

RAWORDER

specifies that the ordering of the GROUPBY variables is based on the raw values of the variables instead of the formatted values.

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

SETSIZE

requests that the server estimate the size of the result set. The procedure does not create a result table if the SETSIZE option is specified. Instead, the procedure reports the number of rows that are returned by the request and the expected memory consumption for the result set (in KB). If you specify the SETSIZE option, the SAS log includes the number of observations and the estimated result set size. See the following log sample:

NOTE: The LASR Analytic Server action request for the STATEMENT
      statement would return 17 rows and approximately
      3.641 kBytes of data.

The typical use of the SETSIZE option is to get an estimate of the size of the result set in situations where you are unsure whether the SAS session can handle a large result set. Be aware that in order to determine the size of the result set, the server has to perform the work as if you were receiving the actual result set. Requesting the estimated size of the result set does consume resources on the server. The estimated number of KB is very close to the actual memory consumption of the result set. It might not be immediately obvious how this size relates to the displayed table, since many tables contain hidden columns. In addition, some elements of the result set might not be converted to tabular output by the procedure.

TEMPEXPRESS="SAS-expressions"

TEMPEXPRESS=file-reference

specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.

Alias

TE=

TEMPNAMES=variable-name

TEMPNAMES=(variable-list)

specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.

Alias

TN=

TEMPTABLE

generates an in-memory temporary table from the result set. The IMSTAT procedure displays the name of the table and stores it in the &_TEMPLAST_ macro variable, provided that the statement executed successfully.

When the IMSTAT procedure exits, all temporary tables created during the IMSTAT session are removed. Temporary tables are not displayed on a TABLEINFO request, unless the temporary table is the active table for the request.

YFORMAT="quoted-string"

specifies the format for the response variable. This format produces the event specified in the EVENT= option.

Alias

YFMT=

Details

You can compare multiple models by specifying predicted values from those models in the variable-list. You can compare models in different data segments with the GROUPBY= option. Note that you must specify the response variable for the ASSESS statement.

When variable-list is not provided, assessment statistics are computed for all numerical variables in the active table.