The GEE Procedure

Example 43.4 GEE for Binary Data with Logit Link Function

Because the respiratory data in Example 43.1 are binary, you can use the alternating logistic regression (ALR) method and model associations by using the log odds ratios instead of working correlations. This example fits a "fully parameterized cluster" model for the log odds ratio. That is, there is a log odds ratio parameter for each unique pair of responses within clusters, and all clusters are parameterized identically. The following statements fit the same regression model for the mean as in Example 43.1 but use a regression model for the log odds ratios instead of a working correlation. LOGOR=FULLCLUST specifies a fully parameterized log odds ratio model.

proc gee data=Resp descend;
   class ID Treatment Center Sex Baseline;
   model Outcome=Treatment Center Sex Age Baseline / dist=bin;
   repeated  subject=ID(Center) / logor=fullclust;
run;

The results of fitting the model are displayed in Output 43.4.1.

Output 43.4.1: Results of ALR Model Fitting

The GEE Procedure

Parameter Estimates for Response Model
with Empirical Standard Error Estimates
Parameter   Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept   1.6001 0.5128 0.5950 2.6052 3.12 0.0018
Treatment A 1.2611 0.3406 0.5934 1.9287 3.70 0.0002
Treatment P 0.0000 0.0000 0.0000 0.0000 . .
Center 1 -0.6287 0.3486 -1.3119 0.0545 -1.80 0.0713
Center 2 0.0000 0.0000 0.0000 0.0000 . .
Sex F 0.1024 0.4362 -0.7526 0.9575 0.23 0.8144
Sex M 0.0000 0.0000 0.0000 0.0000 . .
Age   -0.0162 0.0125 -0.0407 0.0084 -1.29 0.1977
Baseline 0 -1.8980 0.3404 -2.5652 -1.2308 -5.58 <.0001
Baseline 1 0.0000 0.0000 0.0000 0.0000 . .
Alpha1   1.6109 0.4892 0.6522 2.5696 3.29 0.0010
Alpha2   1.0771 0.4834 0.1297 2.0246 2.23 0.0259
Alpha3   1.5875 0.4735 0.6594 2.5155 3.35 0.0008
Alpha4   2.1224 0.5022 1.1381 3.1068 4.23 <.0001
Alpha5   1.8818 0.4686 0.9634 2.8001 4.02 <.0001
Alpha6   2.1046 0.4949 1.1347 3.0745 4.25 <.0001



The parameters Alpha1 through Alpha6 estimate the log odds ratio for each unique within-cluster pair. The correspondence between the log odds ratio parameters and within-cluster pairs is displayed in Output 43.4.2.

Output 43.4.2: Log Odds Ratio Parameters

Log Odds Ratio Parameter
Information
Parameter Group
Alpha1 (1, 2)
Alpha2 (1, 3)
Alpha3 (1, 4)
Alpha4 (2, 3)
Alpha5 (2, 4)
Alpha6 (3, 4)



Model goodness-of-fit criteria are shown in Output 43.4.3.

Output 43.4.3: ALR Model Fit Criteria

GEE Fit Criteria
QIC 511.8589
QICu 499.6516



The QIC for the ALR model shown in Output 43.4.3 is 511.86, whereas the QIC for the unstructured working correlation model shown in Output 43.1.3 is 512.34, indicating that the ALR model has a slightly better fit.

You can fit the same model by fully specifying the $\mb{z}$ matrix; for the definition of the $\mb{z}$ matrix, see the section Specifying Log Odds Ratio Models. The following statements create a data set that contains the full $\mb{z}$ matrix:

data zin;
   keep id center z1-z6 y1 y2;
   array zin(6) z1-z6;
   set resp;
   by center id;
   if first.id
      then do;
      t = 0;
      do m = 1 to 4;
         do n = m+1 to 4;
            do j = 1 to 6;
               zin(j) = 0;
            end;
            y1 = m;
            y2 = n;
            t + 1;
            zin(t) = 1;
            output;
         end;
      end;
   end;
run;
proc print data=zin (obs=12);
run;

Output 43.4.4 displays the full $\mb{z}$ matrix for the first two clusters. The $\mb{z}$ matrix is identical for all clusters in this example.

Output 43.4.4: Full $\mb{z}$ Matrix Data Set

Obs z1 z2 z3 z4 z5 z6 Center ID y1 y2
1 1 0 0 0 0 0 1 1 1 2
2 0 1 0 0 0 0 1 1 1 3
3 0 0 1 0 0 0 1 1 1 4
4 0 0 0 1 0 0 1 1 2 3
5 0 0 0 0 1 0 1 1 2 4
6 0 0 0 0 0 1 1 1 3 4
7 1 0 0 0 0 0 1 2 1 2
8 0 1 0 0 0 0 1 2 1 3
9 0 0 1 0 0 0 1 2 1 4
10 0 0 0 1 0 0 1 2 2 3
11 0 0 0 0 1 0 1 2 2 4
12 0 0 0 0 0 1 1 2 3 4



The following statements fit the model for fully parameterized clusters by fully specifying the $\mb{z}$ matrix. The results are identical to those shown previously.

proc gee data=Resp descend;
   class ID Treatment Center Sex Baseline;
   model Outcome=Treatment Center Sex Age Baseline / dist=bin;
   repeated  subject=ID(Center) / logor=zfull
                                  zdata=zin
                                  zrow =(z1-z6)
                                  ypair=(y1 y2);
run;