The GLMSELECT Procedure

MODELAVERAGE Statement

(Experimental)

MODELAVERAGE <options> ;

The experimental MODELAVERAGE statement requests that model selection be repeated on resampled subsets of the input data. An average model is produced by averaging the parameter estimates of the selected models that are obtained for each resampled subset of the input data.

Table 45.9 summarizes the options available in the MODELAVERAGE statement.

Table 45.9: MODELAVERAGE Statement Options

Option

Description

ALPHA=

Specifies lower and upper quantiles of the sample parameter

DETAILS

Displays model selection details

NSAMPLES=

Specifies the number of samples used for the refit averaging

REFIT

Performs a second round of model averaging

SAMPLING=

Specifies how to generate the samples taken from the training data

SUBSET

Uses only a subset of the selected models in forming the average model

TABLES

Controls the displayed tables


The following options are available:

ALPHA=$\alpha $

controls which lower and upper quantiles of the sample parameter estimates are displayed. The ALPHA= option also controls which quantiles of the predicted values are added to the output data set when the LOWER= and UPPER= options are specified in the OUTPUT statement. The lower and upper quantiles used are $\alpha /2$ and $1-\alpha /2$, respectively. The value specified must lie in the interval $[0,1]$. The default value is ALPHA=0.5.

DETAILS

requests that model selection details be displayed for each sample of the data. The level of detail shown is controlled by the DETAILS= option in the MODEL statement.

NSAMPLES=n

specifies the number of samples to be used. The default value is NSAMPLES=100.

REFIT <(refit-options)>

requests that a second round of model averaging, referred to as the refit averaging, be performed. Usually, the initial round of model averaging produces a model that contains a large number of effects. You can use the refit option to obtain a more parsimonious model. For each data sample in the refit, a least squares model is fit with no effect selection. The effects that are used in the refit depend on the results of the initial round of model averaging. If you do not specify any refit-options, then effects that are selected in at least twenty percent of the samples in the initial round of model averaging are used in the refit model average. The following refit-options are available:

BEST=n

specifies that the n most frequently selected effects in the initial round of model averaging be used in the refit averaging.

MINPCT=percent

specifies that the effects that are selected at least the specified percentage of times in the initial round of model averaging be used in the refit averaging.

NSAMPLES=n

specifies the number of samples to be used for the refit averaging. The default value is the number of samples used in the initial round of model averaging.

SAMPLING=SRS | URS <(sampling-options)>

specifies how the samples of the usable observations in the training data are generated. SAMPLING=SRS specifies simple random sampling in which samples are generated by randomly drawing without replacement. SAMPLING=URS specifies unrestricted random sampling in which samples are generated by randomly drawing with replacement. Model averaging with samples drawn without replacement corresponds to the bootstrap methodology. The default is SAMPLING=URS. If you specify a frequency variable by using a FREQ statement, then the ith observation is sampled $f_ i$ times, where $f_ i$ is the frequency of the ith observation.

You can specify one of the following sampling-options:

PERCENT=percent

specifies the percentage of the training data that is used in each sample. The default value is 75% for SAMPLING=SRS and 100% for SAMPLING=URS.

SIZE=n

specifies the sum of frequencies in each sample.

SUBSET(subset-options)

specifies that only a subset of the selected models be used in forming the average model and producing predicted values. The following subset-options are available:

BEST=n

specifies that only the best n models be used, where the model ranking criterion used is the frequency score. See the section Model Selection Frequencies and Frequency Scores for the definition of the frequency score. If multiple models with the same frequency score correspond to the nth best model, then all these tied models are used in forming the average model and producing predicted values.

MINMODELFREQ=freq

specifies that only models that are selected at least freq times be used in forming the average model and producing predicted values.

TABLES <(ONLY)> <=table-request <(options)>>
TABLES <(ONLY)> <= (table-request <(options)> <... table-request <(options)>>)>

controls the displayed output that is produced in the initial round of model averaging. By default, the following tables are produced:

EFFECTSELECTPCT

displays the percentage of times that effects appear in the selected models.

MODELSELECTFREQ

displays the frequency with which models are selected.

AVGPARMEST

displays the mean, standard deviation, and quantiles of the parameter estimates of the parameters that appear in the selected models.

When you specify only one table-request, you can omit the outer parentheses. Here are some examples:

tables=none
tables=(all parmest(minpct=10))
tables(only)=effectselectpct(order=model minpct=15)

The following table-request options are available:

ALL

requests that all model averaging output tables be produced. You can specify other options with ALL; for example, to request all tables and to require that effects are displayed in decreasing order of selection frequency in the EffectSelectPct table, specify TABLES=(ALL EFFECTSELECTPCT(ORDER=DESCENDING)).

EFFECTSELECTPCT <(effectSelectPct-options)>

specifies how the effects in the EffectSelectPct table are displayed. The following effectSelectPct-options are available:

ALL

specifies that effects that appear in the selected model for any sample be displayed.

MINPCT=percent

specifies that the effects displayed must appear in the selected model for at least the specified percentage of the samples. By default, this table includes effects that appear in at least twenty percent of the selected models. The MINPCT= option is ignored if you also specify the ALL option as a effectSelectPct option.

ORDER=ASCENDING | DESCENDING | MODEL

specifies the order in which the effects are displayed. ORDER=MODEL specifies that effects be displayed in the order in which they appear in the MODEL statement. ORDER= ASCENDING | DESCENDING specifies that the effects be displayed in ascending or descending order of their selection frequency.

MODELSELECTFREQ <(modelSelectFreq-options)>

specifies how the models in the ModelSelectFreq table are displayed. The following modelSelectFreq-options are available:

ALL

specifies that all selected models be displayed in the ModelSelectFreq table.

BEST=n

specifies that only the best n models be displayed, where the model ranking criterion used is the frequency score. See the section Model Selection Frequencies and Frequency Scores for the definition of the frequency score. The default value is BEST=20. The BEST= option is ignored if you also specify the ALL option as a modelSelectFreq-option.

ONLY

suppresses the default output. If you specify the ONLY option within parentheses after the TABLES option, then only the tables specifically requested are produced.

PARMEST <(parmEst-options)>

specifies how the parameters in the AvgParmEst table are displayed. The following parmEst-options are available:

ALL

specifies that parameters that are nonzero in the selected model for any sample be displayed.

MINPCT=percent

specifies that the parameters displayed must have nonzero estimates in the selected model for at least the specified percentage of the samples. By default, this table includes parameters that appear in at least twenty percent of the selected models. The MINPCT= option is ignored if you also specify the ALL option as a parmEst option.

NONZEROPARMS

specifies that for each parameter, the sample that is used to compute the estimate mean, standard deviation, and quantiles consist of just the nonzero values of that parameter in the selected models. If you do not specify the NONZEROPARMS option, then parameters that do not appear in a selected model are assigned the value zero in that model and these zero values are retained when computing the estimate means, standard deviations, and quantiles.

ORDER=ASCENDING | DESCENDING | MODEL

specifies the order in which the effects are displayed. ORDER=MODEL specifies that effects are displayed in the order in which they appear in the MODEL statement. ORDER=ASCENDING | DESCENDING specifies that the effects are displayed in ascending or descending order of their selection frequency.