Unlike in least squares estimation of normalresponse models, variances are not assumed to be equal in the maximum likelihood estimation of logistic, poisson, and other generalized linear models. For these models there is usually a known relationship between the mean and the variance such that the variance cannot be constant. In the poisson model, for instance, the mean and variance are equal. Consequently, the variance will be larger in populations in which the mean is larger. Because variances are not equal in these models, there is no need to test for this condition when fitting the model in a procedure like GENMOD, GLIMMIX, LOGISTIC, or PROBIT which is designed to fit such models. However, if there is more variability than expected under the response distribution then the data are said to be overdispersed.
For generalized linear models in which the response distribution is not normal, the residuals from these models are also not normal. So, again there is no need to test for this condition when fitting the model in a procedure like GENMOD, GLIMMIX, LOGISTIC, or PROBIT which is designed to fit such models.
As in ordinary regression, independence of the observations is assumed. Autocorrelation or correlated clusters of observations may adversely affect the parameter variance estimates. No test of correlation is available. However, when the data are known to be correlated in clusters, the model can be fit using the Generalized Estimating Equations (GEE) method. The GEE method is available via the REPEATED statement in PROC GENMOD or (beginning in SAS 9.4 TS1M2) PROC GEE. The GEE method employs a Whitelike sandwich variance estimator that accounts for the correlation.
As in normalresponse models, collinearity (sometimes called multicollinearity) among the predictor variables in generalized linear models can cause the information matrix to become illconditioned which adversely affects the precision of the estimated model parameters. Inflated standard errors for the model parameters can result in Waldbased tests being inappropriately insignificant. Unlike in normalresponse models, the existence of collinearity does not necessarily imply illconditioning in generalized linear models. Extremely large standard errors for one or more of the estimated parameters and large offdiagonal values in the parameter covariance matrix (COVB option) or correlation matrix (CORRB option) both suggest an illconditioned information matrix. However, these conditions can also happen for reasons not related to collinearity among the raw predictors.
In normalresponse models, you are concerned about the condition of the information matrix, X'X, and this is directly affected by collinearity among the predictors (X). However, in generalized linear models, the information matrix you are concerned about is X'WX. W is a diagonal matrix of weights which is determined by the fitting algorithm at each iteration. It is collinearity in the weighted predictors (W^{½}X) which directly affects the condition of the information matrix. Collinearity in the raw predictors (X) may not result in an illconditioned information matrix because the weights may reduce the effect of the collinearity. Conversely, the weights could cause the information matrix to become illconditioned even if the raw predictors are not collinear. Consequently, an assessment of collinearity in the raw predictors is not equivalent to an assessment of illconditioning as in normalresponse models. For generalized linear models fit using Fisher's scoring, the HESSWGT= option in the OUTPUT statement of PROC GENMOD provides the diagonals of the weight matrix, W. These can then be used in a weighted regression using PROC REG. By including options for assessing collinearity, PROC REG can then provide an assessment of the condition of the information matrix in generalized linear models.
The following statements fit a logistic model to the cancer remission data presented in the stepwise logistic regression example in the PROC LOGISTIC documentation. For a logistic model fit using Fisher's scoring, W has values μ_{i}*(1μ_{i}) along its diagonal, where μ is the binomial mean for the population defined by observation i. In the statements below, these weights are computed by the HESSWGT= option in PROC GENMOD and added to the OUTPUT OUT= data set. The SCORING=50 option ensures that GENMOD uses Fisher's scoring at each iteration (by default, GENMOD does 50 iterations). The CORRB option is included to show the correlations among the parameter estimates of the fitted model. The ODS SELECT statement causes only the correlations and parameter estimates to be displayed.
ods select CorrB ParameterEstimates; proc genmod data=remiss; model remiss = li temp cell / dist=binomial scoring=50 corrb; output out=out hesswgt=w; run;
Notice that several parameter estimates and their standard errors are quite large suggesting that the information matrix may be illconditioned. Also, note the large correlation (0.99) between the Intercept and TEMP parameter estimates (parameters 1 and 3).
To assess the condition of the logistic model's information matrix, a weighted regression is done in PROC REG using the HESSWGT= values as weights and including the collinearity options COLLIN and COLLINOINT. With the WEIGHT statement, the collinearity options in PROC REG assess the information matrix from the final iteration of PROC GENMOD. The ODS SELECT statement displays only the collinearity diagnostics since all other results from PROC REG should be ignored. Note that any response variable, including random values, could be used since the response values are not involved in assessing the information matrix.
ods select CollinDiag CollinDiagNoInt; proc reg data=out; weight w; model remiss = li temp cell / collin collinoint; run; quit;
The large final condition index (315) indicates that collinearity exists among the weighted predictors. The variation proportions associated with this large condition index suggest that TEMP is collinear with the intercept. In the collinearity results adjusted for the intercept (from the COLLINOINT option), the small condition numbers suggest that there is no other collinearity except with the intercept.

In logistic models, separation can also cause large parameter estimates and standard errors. To determine if separation is the issue, use PROC LOGISTIC to fit the model. By default, PROC LOGISTIC checks for separation and will display notes in the SAS^{®} log and in the displayed results if separation is detected. For this model, PROC LOGISTIC does not detect separation, so the problem appears to be one of collinearity.
The following PROC REG step assesses the collinearity of the raw predictors. Again, any response variable could be used and only the collinearity diagnostics in the results are relevant.
ods select CollinDiag CollinDiagNoInt; proc reg data=remiss; model remiss = li temp cell / collin collinoint; run; quit;
The COLLIN results again indicate collinearity between the raw TEMP predictor and the intercept. Large condition indices for both X'WX (315) and X'X (190) indicate collinearity among both the weighted and raw predictors. Assessments of both information matrices point to TEMP being collinear with the intercept. This similarity in the collinearity of the two matrices suggests that the collinearity in the raw predictors could be driving the collinearity in the weighted predictors.

The following analysis shows that the range of TEMP is very restricted, varying only from 0.98 to 1.038, and that it's standard deviation is much smaller than the other predictors. The tiny amount of variability in TEMP relative to its mean is behind its collinearity with the intercept.
proc means data=remiss; run;

By rescaling the predictors, the collinearity with the intercept can be removed. In these statements, PROC STANDARD creates the data set STD containing the rescaled predictors. The S=1 option scales the predictors to have standard deviations of 1.
proc standard data=remiss s=1 out=std; var li temp cell; run;
Refitting the logistic model in GENMOD using the rescaled predictors in the STD data set, the parameter estimates and standard errors no longer exhibit any inflation. The correlation between the intercept and CELL parameters is still noticeably large (0.91), but the results below indicate no problem with illconditioning.

Repeating the assessment of X'WX using PROC REG on the rescaled data, the condition index is now small (10.7) indicating no collinearity among the weighted predictors.

The condition index of X'X is also small (3.8) indicating no collinearity among the raw predictors.

Product Family  Product  System  SAS Release  
Reported  Fixed*  
SAS System  SAS/STAT  Microsoft Windows 2000 Server  
Microsoft Windows 2000 Datacenter Server  
Microsoft Windows 2000 Advanced Server  
Microsoft Windows 95/98  
OS/2  
Microsoft Windows XP 64bit Edition  
Microsoft® Windows® for x64  
Microsoft Windows Server 2003 Enterprise 64bit Edition  
Microsoft Windows Server 2003 Datacenter 64bit Edition  
Microsoft® Windows® for 64Bit Itaniumbased Systems  
z/OS  
OpenVMS VAX  
Microsoft Windows 2000 Professional  
Microsoft Windows NT Workstation  
Microsoft Windows Server 2003 Datacenter Edition  
Microsoft Windows Server 2003 Enterprise Edition  
Microsoft Windows Server 2003 Standard Edition  
Microsoft Windows XP Professional  
Windows Millennium Edition (Me)  
Windows Vista  
64bit Enabled AIX  
64bit Enabled HPUX  
64bit Enabled Solaris  
ABI+ for Intel Architecture  
AIX  
HPUX  
HPUX IPF  
IRIX  
Linux  
Linux for x64  
Linux on Itanium  
OpenVMS Alpha  
OpenVMS on HP Integrity  
Solaris  
Solaris for x64  
Tru64 UNIX 
Type:  Usage Note 
Priority:  
Topic:  Analytics ==> Regression SAS Reference ==> Procedures ==> PROBIT SAS Reference ==> Procedures ==> REG SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> LOGISTIC 
Date Modified:  20080617 16:18:03 
Date Created:  20080617 16:05:23 