![]() |
![]() |
The GENMOD Procedure |
Case Deletion Diagnostic Statistics |
For ordinary generalized linear models, regression diagnostic statistics developed by Williams (1987) can be requested in an output data set or in the OBSTATS table by specifying the DIAGNOSTICS | INFLUENCE option in the MODEL statement. These diagnostics measure the influence of an individual observation on model fit, and generalize the one-step diagnostics developed by Pregibon (1981) for the logistic regression model for binary data.
Preisser and Qaqish (1996) further generalize regression diagnostics to apply to models for correlated data fit by generalized estimating equations (GEEs), where the influence of entire clusters of correlated observations is measured. These diagnostic statistics can be requested in an output data set or in the OBSTATS table if a model for correlated data is specified with a REPEATED statement.
The next two sections use the following notation:
is the maximum likelihood estimate of the regression parameters , or, in the case of correlated data, the solution of the GEEs.
is the corresponding estimate evaluated with the th observation deleted, or, in the case of correlated data, with the
th cluster deleted.
is the dimension of the regression parameter vector .
is the standardized Pearson residual , where
is the variance of the
th response and
is the leverage defined in the section H | LEVERAGE.
is the variance of response ,
, where
is the variance function and
is the dispersion parameter.
is the prior weight of the th observation specified with the WEIGHT statement. If there is no WEIGHT statement,
for all
.
All unknown quantities are replaced by their estimated values in the following two sections.
The following statistics are available for generalized linear models.
The DFBETA statistic for measuring the influence of the th observation is defined as the one-step approximation to the difference in the MLE of the regression parameter vector and the MLE of the regression parameter vector without the
th observation. This one-step approximation assumes a Fisher scoring step, and is given by
![]() |
where is the leverage defined in the section H | LEVERAGE.
The standardized DFBETA statistic for assessing the influence of the th observation on the
th regression parameter is defined as the DFBETA statistic for the
th parameter divided by its estimated standard deviation, where the standard deviation is estimated from all the data.
![]() |
In normal linear regression, the influence of observation can be measured by Cook’s distance (Cook and Weisberg; 1982). A measure of influence of observation
for generalized linear models that is equivalent to Cook’s distance for normal linear regression is given by
![]() |
where is the leverage defined in the section H | LEVERAGE. This measure is the one-step approximation to
, where
is the log likelihood evaluated at
.
The Fisher scores, or expected, weight for observation is
. Let
be the diagonal matrix with
as the
th diagonal. The leverage
of the
th observation is defined as the
th diagonal element of the hat matrix
![]() |
The diagnostic statistics in this section were developed by Preisser and Qaqish (1996). See the section Generalized Estimating Equations for further information and notation for generalized estimating equations (GEEs). The following additional notation is used in this section.
Partition the design matrix and response vector
by cluster; that is, let
, and
corresponding to the
clusters.
Let be the number of responses for cluster
, and denote by
the total number of observations. Denote by
the
diagonal matrix with
as the
th diagonal element. If there is a WEIGHT statement, the diagonal element of
is
, where
is the specified weight of the
th observation in the
th cluster. Let
the
diagonal matrix with
as diagonal elements,
,
. Let
the
diagonal matrix corresponding to cluster
with
as the
th diagonal element.
Let be the
block diagonal weight matrix whose
th block, corresponding to the
th cluster, is the
matrix
![]() |
where is the working correlation matrix for cluster
.
Let
![]() |
where is the
design matrix corresponding to cluster
.
Define the adjusted residual vector as
![]() |
and , the estimated residual for the
th cluster.
Let the subscript denote estimates evaluated without the
th cluster,
estimates evaluated using all the data except the
th observation of the
th cluster, and let
denote matrices corresponding to the
th cluster without the
th observation.
The following statistics are available for generalized estimating equation models.
The leverage of cluster is contained in the matrix
, and is summarized by the trace of
,
![]() |
The leverage of the
th observation in the
th cluster is the
th diagonal element of
.
The effect of deleting cluster on the estimated parameter vector is given by the following one-step approximation for
:
![]() |
The cluster deletion statistic DFBETAC can be standardized using the variances of based on the complete data. The standardized one-step approximation for the change in
due to deletion of cluster
is
![]() |
Partition the matrices and
as
![]() |
![]() |
and let and
.
The effect of deleting the th observation from the
th cluster is given by the following one-step approximation to
:
![]() |
where ,
, and
. Note that
,
, and
are scalars.
The observation deletion statistic DFBETAO can be standardized using the variances of based on the complete data. The standardized one-step approximation for the change in
due to deletion of observation
in cluster
is
![]() |
A measure of the standardized influence of the subset of observations on the overall fit is
. For deletion of cluster
, this is approximated by
![]() |
The measure of overall fit in the section DCLS | CLUSTERCOOKD | CLUSTERCOOKSD for the deletion of the th observation in the
th cluster is approximated by
![]() |
where ,
, and
are defined in the section DFBETAO. In the case of the independence working correlation, this is equal to the measure for ordinary generalized linear models defined in the section DOBS | COOKD | COOKSD.
A studentized distance measure of the type defined in the section DCLS | CLUSTERCOOKD | CLUSTERCOOKSD of the influence of the th cluster is given by
![]() |
![]() |
![]() |
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.