Sample 25005: Compute a diagnostic similar to Cook's D for generalized linear models
Compute a diagnostic similar to Cook's D for generalized linear models
NOTE: Beginning in SAS 9.2, this statistic is available both for independent-observation data and clustered data. See the COOKD= and CLUSTERCOOKD= options in the OUTPUT statement in PROC GENMOD.
- PURPOSE:
-
Illustrate computing a diagnostic similar to Cook's D statistic for
generalized linear models. This diagnostic gives a measure of
the influence of each observation on the model fit.
- REQUIREMENTS:
-
Base SAS and SAS/STAT Software, Version 7 or later.
- DETAILS:
-
Ideally, the influence of an observation on the model fit would be
determined by fitting the model with and without the observation
and computing twice the difference in the resulting log
likelihoods, divided by the number of parameters in the model.
However, this would require the model to be fit as many times as
there are observations. The statistic computed here is an
approximation that is considerably less expensive to compute.
See the
Results tab for the results of running the following SAS code.
data berkeley;
input dept sex yes no total @@;
prop=yes/total;
cards;
1 1 512 331 843 1 2 89 19 108
2 1 353 207 560 2 2 17 8 25
3 1 120 205 325 3 2 202 391 593
4 1 138 279 417 4 2 131 244 375
5 1 53 138 191 5 2 94 299 393
6 1 22 351 373 6 2 24 317 341
;
proc genmod data=berkeley;
class dept sex;
model yes/total=dept sex / dist=binomial link=logit;
output out=obsout stdxbeta=std hesswgt=hesswgt
stdreschi=stdreschi p=pred;
run;
data cookd;
set obsout;
obs=_N_;
/* p is number of parameters excluding the intercept:
5 for DEPT, 1 for SEX */
p=6;
/* h is the leverage */
h=Std*Hesswgt*Std;
cookd=h*Stdreschi**2/((p+1)*(1-h));
run;
proc plot;
plot cookd*obs;
run;
proc print noobs;
var dept sex prop pred cookd;
run;
/********************************************************************
Note that an approximate Cook's D can also be obtained using PROC
LOGISTIC for binary response models. The C= option on the OUTPUT
statement outputs a statistic that is the Cook's D approximation
times the number of parameters.
********************************************************************/
proc logistic data=berkeley;
class dept sex / param=glm;
model yes/total=dept sex;
output out=lout p=pred c=c;
run;
data lout;
set lout;
/* Divide by the total number of parameters in the model */
cookd=c/7;
run;
proc print noobs;
var dept sex prop pred cookd;
run;
- RESULTS:
-
Following is the output from the PRINT procedure below giving the
values of the diagnostic under the variable COOKD:
DEPT SEX PROP PRED COOKD
1 1 0.60735 0.62914 24.6544
1 2 0.82407 0.65402 0.8009
2 1 0.63036 0.63142 0.2322
2 2 0.68000 0.65622 0.0008
3 1 0.36923 0.33493 0.6589
3 2 0.34064 0.35944 1.7236
4 1 0.33094 0.32815 0.0117
4 2 0.34933 0.35243 0.0105
5 1 0.27749 0.23813 0.2929
5 2 0.23919 0.25831 1.1085
6 1 0.05898 0.06131 0.0123
6 2 0.07038 0.06784 0.0124
Illustrate computing a diagnostic similar to Cook's D statistic for
generalized linear models. This diagnostic gives a measure of
the influence of each observation on the model fit.
| Type: | Sample |
| Topic: | SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> LOGISTIC
|
| Date Modified: | 2005-04-08 03:01:49 |
| Date Created: | 2005-01-13 15:03:22 |
Operating System and Release Information
| SAS System | SAS/STAT | All | 8 TS M0 | n/a |