Unlike in least squares estimation of normal-response models, variances are not assumed to be equal in the maximum likelihood estimation of logistic, Poisson, and other generalized linear models. For these models there is usually a known relationship between the mean and the variance such that the variance cannot be constant. In the Poisson model, for example, the mean and variance are equal. Consequently, the variance will be larger in populations in which the mean is larger. Because variances are not equal in these models, there is no need to test for this condition when fitting the model in a procedure such as GENMOD, GLIMMIX, LOGISTIC, or PROBIT, which is designed to fit such models. However, if there is more or less variability than expected under the response distribution then the data are said to be overdispersed or underdispersed (SAS Note 22630) and other models can be used.
For generalized linear models in which the response distribution is not normal, the residuals from these models are also not normal. So, again there is no need to test for this condition when fitting the model in a procedure such as GENMOD, GLIMMIX, LOGISTIC, or PROBIT, which is designed to fit such models.
As in ordinary regression, independence of the observations is assumed. Autocorrelation or correlated clusters of observations might adversely affect the parameter variance estimates. No test of correlation is available. However, when the data are known to be correlated in clusters, the model can be fit using the Generalized Estimating Equations (GEE) method. The GEE method is available via the REPEATED statement in PROC GENMOD or (beginning in SAS® 9.4M2) PROC GEE. The GEE method employs a White-like sandwich variance estimator that accounts for the correlation.
As in normal-response models, collinearity (sometimes called multicollinearity) among the predictor variables in generalized linear models can cause the information matrix to become ill-conditioned, which adversely affects the precision of the estimated model parameters. Inflated standard errors for the model parameters can result in Wald-based tests being inappropriately insignificant. Unlike in normal-response models, the existence of collinearity does not necessarily imply ill-conditioning in generalized linear models. Extremely large standard errors for one or more of the estimated parameters and large off-diagonal values in the parameter covariance matrix (COVB option) or correlation matrix (CORRB option) both suggest an ill-conditioned information matrix. However, these conditions can also happen for reasons not related to collinearity among the raw predictors.
In normal-response models, you are concerned about the condition of the information matrix, X'X, and this is directly affected by collinearity among the predictors (X). However, in generalized linear models, the information matrix you are concerned about is X'WX. W is a diagonal matrix of weights that is determined by the fitting algorithm at each iteration. It is collinearity in the weighted predictors (W½X), which directly affects the condition of the information matrix. Collinearity in the raw predictors (X) might not result in an ill-conditioned information matrix because the weights might reduce the effect of the collinearity. Conversely, the weights could cause the information matrix to become ill-conditioned even if the raw predictors are not collinear. Consequently, an assessment of collinearity in the raw predictors is not equivalent to an assessment of ill-conditioning as in normal-response models. For generalized linear models fit using Fisher's scoring, the HESSWGT= option in the OUTPUT statement of PROC GENMOD provides the diagonals of the weight matrix, W. These can then be used in a weighted regression using PROC REG. By including options for assessing collinearity, PROC REG can then provide an assessment of the condition of the information matrix in generalized linear models.
The following statements fit a logistic model to the cancer remission data presented in the stepwise logistic regression example in the PROC LOGISTIC documentation (SAS Note 22930). For a logistic model fit using Fisher's scoring, W has values μi*(1-μi) along its diagonal, where μ is the binomial mean for the population defined by observation i. In the statements below, these weights are computed by the HESSWGT= option in PROC GENMOD and added to the OUTPUT OUT= data set. The SCORING=50 option ensures that GENMOD uses Fisher's scoring at each iteration (by default, GENMOD does 50 iterations). The CORRB option is included to show the correlations among the parameter estimates of the fitted model. The ODS SELECT statement causes only the correlations and parameter estimates to be displayed.
ods select CorrB ParameterEstimates; proc genmod data=remiss; model remiss = li temp cell / dist=binomial scoring=50 corrb; output out=out hesswgt=w; run;
Notice that several parameter estimates and their standard errors are quite large, suggesting that the information matrix might be ill-conditioned. Also, note the large correlation (-0.99) between the Intercept and TEMP parameter estimates (parameters 1 and 3).
To assess the condition of the logistic model's information matrix, a weighted regression is done in PROC REG using the HESSWGT= values as weights and including the collinearity options COLLIN and COLLINOINT. With the WEIGHT statement, the collinearity options in PROC REG assess the information matrix from the final iteration of PROC GENMOD. The ODS SELECT statement displays only the collinearity diagnostics since all other results from PROC REG should be ignored. Note that any response variable, including random values, could be used since the response values are not involved in assessing the information matrix.
ods select CollinDiag CollinDiagNoInt; proc reg data=out; weight w; model remiss = li temp cell / collin collinoint; run; quit;
The large final condition index (315) indicates that collinearity exists among the weighted predictors. The variation proportions associated with this large condition index suggest that TEMP is collinear with the intercept. In the collinearity results adjusted for the intercept (from the COLLINOINT option), the small condition numbers suggest that there is no other collinearity except with the intercept.
|
In logistic models, separation (SAS Note 22599) can also cause large parameter estimates and standard errors. To determine whether separation is the issue, use PROC LOGISTIC to fit the model. By default, PROC LOGISTIC checks for separation and will display notes in the SAS® log and in the displayed results if separation is detected. For this model, PROC LOGISTIC does not detect separation, so the problem appears to be one of collinearity.
The following PROC REG step assesses the collinearity of the raw predictors. Again, any response variable could be used and only the collinearity diagnostics in the results are relevant.
ods select CollinDiag CollinDiagNoInt; proc reg data=remiss; model remiss = li temp cell / collin collinoint; run; quit;
The COLLIN results again indicate collinearity between the raw TEMP predictor and the intercept. Large condition indices for both X'WX (315) and X'X (190) indicate collinearity among both the weighted and raw predictors. Assessments of both information matrices point to TEMP being collinear with the intercept. This similarity in the collinearity of the two matrices suggests that the collinearity in the raw predictors could be driving the collinearity in the weighted predictors.
|
One approach to this problem is to use a penalty-based selection method that can reduce the inflation of the parameters and simultaneously find the important predictors of the response. This is illustrated for these data in Example 3 of SAS Note 60240. However, the following analysis suggests another solution. It shows that the range of TEMP in these data is very restricted, varying only from 0.98 to 1.038, and that its standard deviation is much smaller than the other predictors. The tiny amount of variability in TEMP relative to its mean is behind its collinearity with the intercept.
proc means data=remiss; run;
|
By rescaling the predictors, the collinearity with the intercept can be removed. In these statements, PROC STANDARD creates the data set STD containing the rescaled predictors. The S=1 option scales the predictors to have standard deviations of 1.
proc standard data=remiss s=1 out=std; var li temp cell; run;
Refitting the logistic model in GENMOD using the rescaled predictors in the STD data set, the parameter estimates and standard errors no longer exhibit any inflation. The correlation between the intercept and CELL parameters is still noticeably large (-0.91), but the results below indicate no problem with ill-conditioning.
|
Repeating the assessment of X'WX using PROC REG on the rescaled data, the condition index is now small (10.7) indicating no collinearity among the weighted predictors.
|
The condition index of X'X is also small (3.8) indicating no collinearity among the raw predictors.
|
Belsley, D.A., Kuh, E., and Welch, R.E. (1980), Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: John Wiley & Sons.
Lesaffre, E. and Marx, B.D. (1993), "Collinearity in Generalized Linear Regression," Comm. Stat. Theory and Meth., 22(7), 1933-1952.
Muller, K.E. and Fetterman, B.A. (2002), Regression and ANOVA: An Integrated Approach Using SAS® Software, Cary, NC: SAS Institute Inc.
Segerstedt, B. and Nyquist, H. (1992), "On the conditioning problem in generalized linear models," Journal of Applied Statistics, 19(4), 513-526.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | Microsoft Windows 2000 Server | ||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 95/98 | ||||
OS/2 | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
z/OS | ||||
OpenVMS VAX | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows XP Professional | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Regression SAS Reference ==> Procedures ==> PROBIT SAS Reference ==> Procedures ==> REG SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> LOGISTIC |
Date Modified: | 2023-03-02 14:45:33 |
Date Created: | 2008-06-17 16:05:23 |