Although the most commonly used link and probability distributions are available as built-in functions, the GENMOD procedure enables you to define your own link functions and response probability distributions by using the FWDLINK, INVLINK, VARIANCE, and DEVIANCE statements. The variables assigned in these statements can have values computed in programming statements.
These programming statements can occur anywhere between the PROC GENMOD statement and the RUN statement. Variable names used in programming statements
must be unique. Variables from the input data set can be referenced in programming statements. The mean, linear predictor,
and response are represented by the automatic variables _MEAN_
, _XBETA_
, and _RESP_
, respectively, which can be referenced in your programming statements. Programming statements are used to define the functional
dependencies of the link function, the inverse link function, the variance function, and the deviance function on the mean,
linear predictor, and response variable.
The following statements illustrate the use of programming statements. Even though you usually request the Poisson distribution by specifying DIST=POISSON as a MODEL statement option, you can define the variance and deviance functions for the Poisson distribution by using the VARIANCE and DEVIANCE statements. For example, the following statements perform the same analysis as the Poisson regression example in the section Getting Started: GENMOD Procedure.
The statements must be in logical order for computation, just as in a DATA step.
proc genmod; class car age; a = _MEAN_; y = _RESP_; d = 2 * ( y * log( y / a ) - ( y - a ) ); variance var = a; deviance dev = d; model c = car age / link = log offset = ln; run;
The variables var
and dev
are dummy variables used internally by the procedure to identify the variance and deviance functions. Any valid SAS variable
names can be used.
Similarly, the log link function and its inverse could be defined with the FWDLINK and INVLINK statements, as follows:
fwdlink link = log(_MEAN_); invlink ilink = exp(_XBETA_);
These statements are for illustration, and they work well for most Poisson regression problems. If, however, in the iterative fitting process, the mean parameter becomes too close to 0, or a 0 response value occurs, an error condition occurs when the procedure attempts to evaluate the log function. You can circumvent this kind of problem by using IF-THEN/ELSE clauses or other conditional statements to check for possible error conditions and appropriately define the functions for these cases.
Data set variables can be referenced in user definitions of the link function and response distributions by using programming statements and the FWDLINK, INVLINK, DEVIANCE, and VARIANCE statements.
See the DEVIANCE, VARIANCE, FWDLINK, and INVLINK statements for more information.
The syntax of programming statements used in PROC GENMOD is identical to that used in the NLMIXED procedure and the GLIMMIX procedure (see Chapter 64: The NLMIXED Procedure, and Chapter 41: The GLIMMIX Procedure,) and the MODEL procedure (see the SAS/ETS User's Guide). Most of the programming statements that can be used in the DATA step can also be used in the GENMOD procedure. See SAS Statements: Reference for a description of SAS programming statements. The following are some commonly used programming statements.
ABORT;
ARRAY arrayname <[ dimensions ]> <$> <variables-and-constants>;
CALL name <(expression <, expression …>)>;
DELETE;
DO <variable = expression <TO expression> <BY expression>> <, expression <TO expression> <BY expression>> …
<WHILE expression> <UNTIL expression>;
END;
GOTO statement-label;
IF expression;
IF expression THEN program-statement;
ELSE program-statement;
variable = expression;
variable + expression;
LINK statement-label;
PUT <variable> <=> …;
RETURN;
SELECT <(expression)>;
STOP;
SUBSTR(variable, index, length)= expression;
WHEN (expression)program-statement;
OTHERWISE program-statement;