MODEL
dependent <(options)>=<effects> </ options> ;
MODEL
events/trials = <effects> </ options> ;
The MODEL statement names the response variable and the explanatory effects, including covariates, main effects, interactions,
and nested effects; see the section Specification of Effects in Chapter 44: The GLM Procedure, for more information. If you omit the explanatory effects, the procedure fits an interceptonly model. You must specify exactly
one MODEL statement.
You can specify two forms of the MODEL statement. The first form, referred to as singletrial syntax, is applicable to binary, ordinal, and nominal response data. The second form, referred to as events/trials syntax, is restricted to binary response data. You use the singletrial syntax when each observation in the DATA= data set contains information about only a single trial, such as a single subject
in an experiment. When each observation contains information about multiple binary response trials, such as the counts of
the number of observed subjects and the number of subjects who respond, then you can use the events/trials syntax.
In the events/trials syntax, you specify two variables that contain count data for a binomial experiment. These two variables are separated by
a slash. The value of the first variable, events, is the number of positive responses (or events). The value of the second variable, trials, is the number of trials. The values of both events and (trials–events) must be nonnegative and the value of trials must be positive for the response to be valid.
In the singletrial syntax, you specify one variable (on the left side of the equal sign) as the response variable. This variable can be character
or numeric. You can specify variable options specific to the response variable immediately after the response variable with
parentheses around them.
For both forms of the MODEL statement, explanatory effects follow the equal sign. Variables can be either continuous or classification variables. Classification variables can be character
or numeric, and they must be declared in the CLASS statement. When an effect is a classification variable, the procedure inserts a set of coded columns into the design matrix instead of directly entering
a single column that contains the values of the variable.
Table 25.3 summarizes the options available in the MODEL statement.
Table 25.3: MODEL Statement Options
Option

Description

Response Variable Options

DESCENDING

Reverses the order of the response categories

EVENT=

Specifies the event category for the binary response

ORDER=

Specifies the sort order for the binary response

REFERENCE=

Specifies the reference category for the binary response

Statistical Modeling Options

ADDITIVE

Requests an additive model

ALPHA

Controls the knot selection

CVMETHOD=

Specifies how subsets for cross validation are formed

DFPERBASIS

Specifies degrees of freedom per basis function

DIST=

Specifies the distribution family

FAST

Controls the fastforward selection algorithm

FORWARDONLY

Requests that the backward selection process be skipped

KEEP=

Specifies effects to be included in the final model

LINEAR=

Specifies linear effects to be examined in model selection

LINK=

Specifies the link function

MAXBASIS=

Specifies the maximum number of basis functions allowed

MAXORDER=

Specifies the maximum order of interactions allowed

NOMISS

Requests removal of missing values from modeling

OFFSET=

Specifies an offset for the linear predictor

VARPENALTY=

Specifies the penalty for variable reentry

You can specify the following options in the MODEL statement.
Response Variable Options
Response variable options determine how the ADAPTIVEREG procedure models probabilities for binary data. You can specify the
following response variable options by enclosing them in parentheses after the response variable.

DESCENDING
DESC

reverses the order of the response categories.
If both the DESCENDING and ORDER= options are specified, PROC ADAPTIVEREG orders the response categories according to the
ORDER= option and then reverses that order.

EVENT='category' FIRST LAST

specifies the event category for the binary response model. PROC ADAPTIVEREG models the probability of the event category.
You can specify one of the following values for this option:
 'category'

specifies the formatted value of the reference category.
 FIRST

designates the first ordered category as the event.
 LAST

designates the last ordered category as the event.
The default is EVENT=FIRST.
One of the most common sets of response levels is , with 1 representing the event for which the probability is to be modeled. Consider the example where Y
takes the value 1 for event and 0 for nonevent, and X
is the explanatory variable. To specify the value 1 as the event category, use the following MODEL statement:
model Y (event='1') = X;

ORDER=ordertype

specifies the sort order for the categories of categorical variables. This ordering determines which parameters in the model
correspond to each level in the data. When the default ORDER=FORMATTED is in effect for numeric variables for which you have
supplied no explicit format, the levels are ordered by their internal values. Table 25.4 shows how PROC ADAPTIVEREG interprets values of the ORDER= option.
Table 25.4: Sort Order for Categorical Variables
ordertype

Levels Sorted By

DATA

Order of appearance in the input data set

FORMATTED

External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal)
value

FREQ

Descending frequency count; levels with the most observations come first in the order

FREQDATA

Order of descending frequency count, and within counts by order of appearance in the input data set when counts are tied

FREQFORMATTED

Order of descending frequency count, and within counts by formatted value (as above) when counts are tied

FREQINTERNAL

Order of descending frequency count, and within counts by unformatted value when counts are tied

INTERNAL

Unformatted value

For the FORMATTED and INTERNAL values, the sort order is machinedependent. If you specify the ORDER= option in the MODEL
statement and the ORDER= option in the CLASS statement, the former takes precedence.
For more information about sort order, see the chapter on the SORT procedure in the
Base SAS Procedures Guide and the discussion of BYgroup processing in
SAS Language Reference: Concepts.

REFERENCE='category' FIRST LAST
REF='category' FIRST LAST

specifies the reference category for the binary or multinomial response model. For the binary response model, specifying one
response category as the reference is the same as specifying the other response category as the event category. You can specify
one of the following values for this option:
 'category'

specifies the formatted value of the reference category.
 FIRST

designates the first ordered category as the reference.
 LAST

designates the last ordered category as the reference.
The default is REFERENCE=LAST.
You can specify the following model options.

ADDITIVE

requests an additive model for which only main effects are included in the fitted model. If you do not specify the ADDITIVE
option, PROC ADAPTIVEREG fits a model that has both main effects and twoway interaction terms.

ALPHA=number

specifies the parameter that controls the number of knots considered for each variable. Friedman (1991b) uses the following as the number of observations between interior knots:
Friedman also uses the following as the number of observations between extreme knots and the corresponding variable boundary values,
where p is the number of variables and is the number of observations for which a parent basis . The value of should be greater than 0 and less than 1. The default is ALPHA=0.05.

CVMETHOD=RANDOM <(n)>
CVMETHOD=INDEX (variable)

specifies the method for subdividing the training data into n parts when you request nfold cross validation when you do backward selection. CVMETHOD=RANDOM assigns each training observation randomly to one of
the n parts. CVMETHOD=INDEX(variable) assigns observations to parts based on the formatted value of the named variable. This input data set variable is treated as a classification variable, and the number of parts n is the number of distinct levels of this variable. By optionally naming this variable in a CLASS statement, you can use the
ORDER= option in the CLASS statement to control how this variable is levelized.
The value of n defaults to 5 with CVMETHOD=RANDOM.

DFPERBASIS=d
DF=d

specifies the degrees of freedom (d) that are “charged” for each basis function that is used in the lackoffit function for backward selection. Larger values of d lead to fewer spline knots and thus smoother function estimates. The default is DFPERBASIS=2.

DIST=distributionid

specifies the distribution family used in the model.
If you do not specify a distributionid, the ADAPTIVEREG procedure defaults to the normal distribution for continuous response variables and to the binary distribution
for classification or character variables, unless the events/trial syntax is used in the MODEL statement. If you choose the events/trial syntax, the ADAPTIVEREG procedure defaults to the binomial distribution.
Table 25.5 lists the values of the DIST= option and the corresponding default link functions. For generalized linear models with these
distributions, you can find expressions for the loglikelihood functions in the section LogLikelihood Functions in Chapter 42: The GENMOD Procedure.
Table 25.5: Values of the DIST= Option
distributionid

Aliases

Distribution

Default Link Function

BINOMIAL


Binomial

Logit

GAMMA

GAM, G

Gamma

Reciprocal

GAUSSIAN

NORMAL, N, NOR

Normal

Identity

IGAUSSIAN

IG

Inverse Gaussian

Inverse squared




(power(–2))

NEGBIN

NB

Negative binomial

Log

POISSON

POI

Poisson

Log


FAST<(fastoptions)>

improves the speed of the modeling. Because of the computation complexity in the original multivariate adaptive regression
splines algorithm, Friedman (1993) proposes modifications to improve the speed by tuning several parameters. See the section Fast Algorithm for more information about the improvement of the multivariate adaptive regression splines algorithm. You can specify the
following fastoptions:

BETA=beta

specifies the “aging” factor in the priority queue of candidate parent bases. Larger values of beta result in lowimprovement parents rising fast into top list of candidates. The default value is BETA=1.

H=h

specifies the parameter that controls how often the improvement is recomputed for a parent basis over all candidate variables. Larger values of h cause fewer computations of improvement. The default value is H=1.

K=k

specifies the number of top candidates in the priority queue of parent bases for selecting new bases. Larger values of k cause more parent bases to be considered. The default is to use half of eligible parent bases at every iteration.

FORWARDONLY

skips the backward selection step after forward selection is finished.

KEEP=effects

specifies a list of variables to be included in the final model.

LINEAR=effects

specifies a list of variables to be considered without nonparametric transformation. They should appear in the linear form
if they are selected.

LINK=keyword

specifies the link function in the model. Not all link functions are available for all distribution families. The keywords and expressions for the associated link functions are shown in Table 25.6.
Table 25.6: Link Functions in MODEL Statement of the ADAPTIVEREG Procedure
keyword

Alias

Link Function


IDENTITY

ID

Identity


LOG


Log


LOGIT


Logit


POWERMINUS2


Power with exponent –2


PROBIT

NORMIT

Probit


RECIPROCAL

INVERSE

Reciprocal



MAXBASIS=number

specifies the maximum number of basis functions that can be used in the final model. The default value is the larger value between 21 and one plus two times the number of
nonintercept effects specified in the MODEL statement.

MAXORDER=number

specifies the maximum interaction levels for effects that could potentially enter the model. The default value is MAXORDER=2.

NOMISS

excludes all observations with missing values from the model fitting. By default, the ADAPTIVEREG procedure takes the missingness
into account when an explanatory variable has missing values. For more information about how PROC ADAPTIVEREG handles missing
values, see the section Missing Values.

OFFSET=variable

specifies an offset for the linear predictor. An offset plays the role of a predictor
whose coefficient is known to be 1. For example, you can use an offset in a Poisson model when counts have been obtained in
time intervals of different lengths. With a log link function, you can model the counts as Poisson variables with the logarithm
of the time interval as the offset variable. The offset variable cannot appear in the CLASS statement or elsewhere in the
MODEL statement.

VARPENALTY=

specifies the incremental penalty for increasing the number of variables in the adaptive regression model. To discourage a model with too many variables, at
each iteration of the forward selection the model improvement is reduced by a factor of for any new variable that is introduced.
For highly collinear designs, the VARPENALTY= option helps PROC ADAPTIVEREG produce models that are nearly equivalent in terms
of residual sum of squares but have fewer independent variables. Friedman (1991b) suggests the following values for :
 0.0

no penalty (default value)
 0.05

moderate penalty
 0.1

heavy penalty
The best value depends on the specific situation. Some experimenting with different values is usually required. You should
use this option with care.