PROC QUANTSELECT <options>;
Table 96.1 lists the options available in the PROC QUANTSELECT statement.
Table 96.1: PROC QUANTSELECT Statement Options
option |
Description |
---|---|
Data Set Options |
|
Names a data set to use for the regression |
|
Sets the maximum number of macro variables to produce |
|
Names a data set that contains test data |
|
Names a data set that contains validation data |
|
ODS Graphics Options |
|
Produces ODS Graphics displays |
|
Other Options |
|
Specifies an algorithm for estimating the regression parameters |
|
Specifies the maximum length of effect names in tables and output data sets |
|
Suppresses displayed output (including plots) |
|
Names a data set that contains the design matrix |
|
Sets the style of parameter names and labels for nested and crossed effects |
|
Sets the seed used for pseudorandom number generation |
You can specify the following options (shown in alphabetical order) in the PROC QUANTSELECT statement.
specifies either the simplex algorithm (ALGORITHM=SIMPLEX) or the smoothing algorithm (ALGORITHM=SMOOTH) for estimating the regression parameters. The smoothing algorithm is computationally much more efficient than the simplex algorithm for fitting models on large data sets. You might consider specifying the ALGORITHM=SMOOTH if your DATA= data set contains more than 5,000 observations and more than 50 regressors. The smoothing algorithm does not support quantile process effect selection or the LASSO selection method. By default, ALGORITHM=SIMPLEX.
names the SAS data set to be used by PROC QUANTSELECT. If the DATA= option is not specified, PROC QUANTSELECT uses the most
recently created SAS data set. If the data set contains a variable named _ROLE_
, then this variable is used to assign observations for training, validation, and testing roles. See the section Using Validation and Test Data for more information about using the _ROLE_
variable.
specifies the maximum number of macro variables with selected effects to create. By default, MAXMACRO=100.
PROC QUANTSELECT saves the list of selected effects in a macro variable, &_QRSIND
. For example, suppose your input effect list consists of x1
–x10
. Then &_QRSIND
would be set to x1
x3
x4
x10
if the first, third, fourth, and tenth effects were selected for the model. This list can be used in the MODEL statement
of a subsequent procedure.
If you specify the OUTDESIGN= option in the PROC QUANTSELECT statement, then PROC QUANTSELECT saves the list of columns in
the design matrix in a macro variable named &_QRSMOD
.
With multiple quantile levels and BY processing, one macro variable is created for each combination of quantile level and BY group, and the macro variables are indexed by the BY-group number and the quantile-level index. You can use the MAXMACRO= option to either limit or increase the number of these macro variables when you are processing data sets with many combinations of quantile level and BY group.
With a single quantile level and no BY-group processing, PROC QUANTSELECT creates the macro variables shown in Table 96.2.
Table 96.2: Macro Variables Created for a Single Quantile Level and No BY Processing
Macro Variable Name |
Contains |
---|---|
|
Selected effects |
|
Selected effects |
|
Selected effects |
|
Selected effects |
|
Selected design matrix columns |
|
Selected design matrix columns |
|
Selected design matrix columns |
|
Selected design matrix columns |
With multiple quantile levels and BY-group processing, PROC QUANTSELECT creates the macro variables shown in Table 96.3.
Table 96.3: Macro Variables Created for a Multiple Quantile Levels and BY-Group Processing
Macro Variable Name |
Contains |
---|---|
|
Selected effects for quantile 1 and BY group 1 |
|
Selected effects for quantile 1 and BY group 1 |
|
Selected effects for quantile 2 and BY group 1 |
. |
|
. |
|
. |
|
|
Selected effects for quantile 1 and BY group 1 |
|
Selected effects for quantile 1 and BY group 1 |
|
Selected effects for quantile 2 and BY group 1 |
. |
|
. |
|
. |
|
|
Selected effects for quantile 1 and BY group 2 |
|
Selected effects for quantile 1 and BY group 2 |
|
Selected effects for quantile 2 and BY group 2 |
. |
|
. |
|
. |
|
|
Selected effects for quantile n and BY group m |
If you specify the OUTDESIGN= option, PROC QUANTSELECT also creates the macro variables shown in Table 96.4.
Table 96.4: Macro Variables Created When the OUTDESIGN= Option Is Specified
Macro Variable Name |
Contains |
---|---|
|
Selected design matrix columns for BY group 1 |
|
Selected design matrix columns for BY group 1 |
|
Selected design matrix columns for BY group 2 |
. |
|
. |
|
. |
|
|
Selected design matrix columns for quantile n and BY group m |
The macros variables in Table 96.5 show the number of quantiles and BY groups:
Table 96.5: Macro Variables Showing the Number of Quantiles and BY Groups
Macro Variable Name |
Contains |
---|---|
|
The number of BY groups |
|
The number of quantiles |
|
The number of |
|
The number of |
. |
|
. |
|
. |
|
|
The number of |
See the section Macro Variables That Contain Selected Models for more information.
specifies the maximum length of effect names. By default, NAMELEN=20. If you specify a value less than 20, the default is used.
creates a data set that contains the design matrix. By default, the QUANTSELECT procedure includes in the OUTDESIGN data set the matrix that corresponds to all the effects in the selected models. Two schemes for naming the columns of the design matrix are available:
In the first scheme, names of the parameters are constructed from the parameter labels that appear in the parameter estimates table. This naming scheme is the default when you do not request BY processing, or when you specify the FULLMODEL option with BY processing.
In the second scheme, the design matrix column names consist of a prefix followed by an index. The default name prefix is
_X
. This scheme is used when you specify the PREFIX= option, or when you specify a BY statement without using the FULLMODEL
option; otherwise the first scheme is used.
You can specify the following options in parentheses to control the contents of the OUTDESIGN= data set:
includes all the input data set variables in the OUTDESIGN= data set.
includes the VALDATA= data set observations in the OUTDESIGN= data set. This option is ignored if the VALDATA= data set is not specified.
includes the TESTDATA= data set observations in the OUTDESIGN= data set. This option is ignored if TESTDATA= data set is not specified.
includes in the OUTDESIGN= data set parameters that correspond to all effects that are specified in the MODEL statement. By default, only parameters that correspond to the selected model are included.
produces a table that associates columns in the OUTDESIGN= data set with the labels of the parameters they represent.
creates the design matrix column names from a prefix followed by an index. The default prefix is _X
.
specifies how parameter names and labels are constructed for nested and crossed effects.
The following options are available:
forms parameter names and labels by positioning levels of classification variables and constructed effects adjacent to the
associated variable or constructed effect name and using " * " as the delimiter for both crossed and nested effects. This
style of naming parameters and labels is used in the TRANSREG procedure. You can request truncation of the classification
variable names used in forming the parameter names and labels by using the CPREFIX= and LPREFIX= options in the CLASS statement.
You can use the SEPARATOR= suboption to change the delimiter between the crossed variables in the effect. PARMLABELSTYLE=INTERLACED
is not supported if you specify the SPLIT option in an EFFECT statement or a CLASS statement. The following are examples of
the parameter labels in this style (Age
is a continuous variable, Gender
and City
are classification variables):
Age Gender male * City Beijing City London * Age
specifies that in forming parameter names and labels, the effect name appears before the levels associated with the classification
variables and constructed effects in the effect. You can control the length of the effect name by using the NAMELEN= option
in the PROC GLMSELECT statement. In forming parameter labels, the first level that is displayed is positioned so that it starts
at the same offset in every parameter label—this enables you to easily distinguish the effect name from the levels when the
parameter labels are displayed in a column in the "Parameter Estimates" table. The following are examples of the parameter
labels in this style (Age
is a continuous variable, Gender
and City
are classification variables):
Age Gender*City male Beijing Age*City London
requests the same parameter naming and labeling scheme as PARMLABELSTYLE=SEPARATE except that the first level in the parameter
label is separated from the effect name by a single blank. This style of labeling is used in the PLS procedure and is the
default if you do not specify the PARMLABELSTYLE option. The following are examples of the parameter labels in this style
(Age
is a continuous variable, Gender
and City
are classification variables):
Age Gender*City male Beijing Age*City London
controls the plots that are produced through ODS Graphics. When you specify only one plot-request, you can omit the parentheses around it. Here are some examples:
plots=all plots=coefficients(unpack) plots(unpack)=(coef acl crit)
ODS Graphics must be enabled before plots can be requested. For example:
ods graphics on; proc quantselect plots=all; class temp sex / split; model depVar = sex sex*temp; run;
For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 21: Statistical Graphics Using ODS.
You can specify the following global-plot-options, which apply to all plots generated by the QUANTSELECT procedure, unless they are altered by specific plot options.
specifies that the step ranges shown on the horizontal axes of plots terminate at the specified step. By default, the step range shown terminates at the final step of the selection process. If you specify the ENDSTEP= option as both a global-plot-option and as an option for a specific plot-request, then PROC QUANTSELECT uses the ENDSTEP=n option for the specific plot-request.
displays the natural logarithm of the entry and removal significance levels when the SELECT= SL option is specified in the MODEL statement.
specifies the maximum number of characters beyond which labels of effects on plots are truncated. The default is MAXSTEPLABEL=256.
specifies the maximum number of characters beyond which parameter labels on plots are truncated. The default is MAXPARMLABEL=256.
specifies that the step ranges shown on the horizontal axes of plots start at the specified step. By default, the step range shown starts at the initial step of the selection process. If you specify the STATSTEP= option as both a global-plot-option and as an option for a specific plot-request, then PROC QUANTSELECT uses the STARTSTEP=n option for the specific plot-request. The default is STARTSTEP=0.
specifies the method for labeling the horizontal plot axis. This axis represents the sequence of entering or departing effects. The default is STEPAXIS=EFFECT.
labels each step by a prefix followed by the name of the effect that enters or leaves at that step. The prefix consists of the step number followed by a "+" sign or a "–" sign, depending on whether the effect enters or leaves at that step.
labels the horizontal axis value at step i with the penalty on the parameter estimates at step i, normalized by the penalty on the parameter estimates at the final step. This option is valid only with regularization selection methods.
labels each step with the step number.
displays each graph separately. (By default, some graphs can appear together in a single panel.) You can also specify UNPACK as a suboption with CRITERIA and COEFFICIENTS options for specific plot-requests.
The following list describes the specific plot-requests and their options.
displays all appropriate graphs.
plots the progression of the average check losses on the training data, and on the test and validation data when these data are provided with the TESTDATA= or VALDATA= options or are produced by using a PARTITION statement. When the PROC QUANTSELECT procedure is applied on multiple quantile levels, the ACL option and its suboptions apply to the ACL plots for each of the quantile levels.
You can specify the following aclplot-option:
specifies the method for labeling the horizontal plot axis. See the STEPAXIS= option in the global-plot-options for more information.
displays a panel of two plots for each quantile level. The upper plot shows the progression of the parameter values as the selection process proceeds. The lower plot shows the progression of the CHOOSE= criterion. If no CHOOSE= criterion is in effect, then the AICC criterion is displayed. You can specify the following coefficient-panel-options:
specifies the percentage of the vertical axis range that forms the minimum gap between successive parameter labels at the final step of the coefficient progression plot. If the values of more than one parameter at the final step are closer than this gap, then the labels on all but one of these parameters are suppressed. The default is LABELGAP=5.
displays the natural logarithm of the entry and removal significance levels when the SELECT= SL option is specified in the MODEL statement.
specifies the horizontal axis to be used. See the STEPAXIS= option in the global-options for more information.
displays the coefficient progression and the CHOOSE= criterion progression in separate plots.
plots a panel of model fit criteria. If multiple quantile levels apply, the CRITERIA option plots a panel of model fit criteria for each quantile level. The criteria that are displayed are AIC, AICC, and SBC, in addition to any other criteria that are named in the CHOOSE= , SELECT= , STOP= , and STATS= options in the MODEL statement. You can specify the following criterion-panel-options:
specifies the horizontal axis to be used. See the STEPAXIS= option in the global-options for more information.
displays each criterion progression on a separate plot.
suppresses all plots.
specifies an integer that is used to start the pseudorandom number generator for random partitioning of data for training, testing, and validation. If you do not specify a seed or if you specify a value less than or equal to 0, the seed is generated by reading the time of day from the computer’s clock.
names a SAS data set that contains test data. This data set must contain all the effects that are specified in the MODEL statement. Furthermore, when you also specify a BY statement and the TESTDATA= data set contains any of the BY variables, then the TESTDATA= data set must also contain all the BY variables sorted in the order of the BY variables. In this case, only the test data for a specific BY group are used with the corresponding BY group in the analysis data. If the TESTDATA= data set contains none of the BY variables, then the entire TESTDATA= data set is used with each BY group of the analysis data.
If you specify both a TESTDATA= data set and the PARTITION statement, then the testing observations from the DATA= data set are merged with the TESTDATA= data set for testing purposes.
names a SAS data set that contains validation data. This data set must contain all the effects that are specified in the MODEL statement. Furthermore, when a BY statement is used and the VALDATA= data set contains any of the BY variables, then the VALDATA= data set must also contain all the BY variables sorted in the order of the BY variables. In this case, only the validation data for a specific BY group are used with the corresponding BY group in the analysis data. If the VALDATA= data set contains none of the BY variables, then the entire VALDATA= data set is used with each BY group of the analysis data.
If you specify both a VALDATA= data set and the PARTITION statement, then the validation observations from the DATA= data set are merged with the VALDATA= data set for validation purposes.