Example 10 for PROC LOGISTIC


/****************************************************************/
/*          S A S   S A M P L E   L I B R A R Y                 */
/*                                                              */
/*    NAME: LOGIEX10                                            */
/*   TITLE: Example 10 for PROC LOGISTIC                        */
/* PRODUCT: STAT                                                */
/*  SYSTEM: ALL                                                 */
/*    KEYS: logistic regression analysis,                       */
/*          binomial response data,                             */
/*   PROCS: LOGISTIC                                            */
/*    DATA:                                                     */
/*                                                              */
/* SUPPORT: Bob Derr                                            */
/*     REF: SAS/STAT User's Guide, PROC LOGISTIC chapter        */
/*    MISC:                                                     */
/*                                                              */
/****************************************************************/

/*****************************************************************
Example 10. Overdispersion
*****************************************************************/

/*
In a seed germination test, seeds of two cultivars were planted in pots of
two soil conditions. The following SAS statements create the data set SEEDS,
which contains the observed proportion of seeds that germinated for various
combinations of cultivar and soil condition. Variable N represents the number
of seeds planted in a pot, and R represents the number germinated. CULT and
SOIL are indicator variables, representing the cultivar and soil condition,
respectively.

PROC LOGISTIC is first used to fit a logit model to the data, with CULT, SOIL
and CULT*SOIL (the CULT x SOIL interaction) as explanatory variables. The
option SCALE=NONE is specified to display the goodness-of-fit statistics.

The results from the first LOGISTIC run suggest that without adjusting for
the overdispersion, the standard errors are likely to be underestimated,
causing the Wald tests to be oversensitive.  In PROC LOGISTIC, there are
three SCALE= options to accommodate overdispersion. With unequal sample sizes
for the observations, SCALE=WILLIAMS is preferred.  In the second LOGISTIC
call, the option SCALE=WILLIAMS is included. The Williams model estimates a
scale parameter by equating the value of Pearson's chi-square for full model
to its approximate expected value. The full model considered here is the
factorial model with cultivar and soil condition as factors.

The estimate of the Williams scale parameter is 0.075941 and is given in the
formula for the WEIGHT variable at the beginning of the printed output. Since
both CULT and CULT*SOIL are not statistically significant (p=.5289 and
p=.9275, respectively), a reduced model containing only the soil condition
factor is then fitted in the final LOGISTIC run.  Here, the observations are
weighted by 1/(1+0.075941(N-1)) by including the scale estimate in the
SCALE=WILLIAMS option as shown.
*/

title 'Example 10. Overdispersion';

data seeds;
   input pot n r cult soil;
   datalines;
 1 16     8      0       0
 2 51    26      0       0
 3 45    23      0       0
 4 39    10      0       0
 5 36     9      0       0
 6 81    23      1       0
 7 30    10      1       0
 8 39    17      1       0
 9 28     8      1       0
10 62    23      1       0
11 51    32      0       1
12 72    55      0       1
13 41    22      0       1
14 12     3      0       1
15 13    10      0       1
16 79    46      1       1
17 30    15      1       1
18 51    32      1       1
19 74    53      1       1
20 56    12      1       1
;


proc logistic data=seeds;
   model r/n=cult soil cult*soil/scale=none;
   title 'Full Model With SCALE=NONE';
run;

proc logistic data=seeds;
   model r/n=cult soil cult*soil / scale=williams;
   title 'Full Model With SCALE=WILLIAMS';
run;

proc logistic data=seeds;
   model r/n=soil / scale=williams(0.075941);
   title 'Reduced Model With SCALE=WILLIAMS(0.075941)';
run;