The LOGISTIC Procedure

Example 72.12 Exact Conditional Logistic Regression

The following data, from Hand et al. (1994), contain the results of a study of 49 anxious or depressed children. The Diagnosis variable indicates whether the child was anxious or depressed when the study began, the Friendships variable indicates whether the child has good friendships, the Total variable represents the total number of children in the study who exhibit the specified values for Diagnosis and Friendships, and the Recovered variable represents the number of children whose mothers believe that their child has recovered at the end of the study.

data one;
   length Diagnosis $ 9;
   input Diagnosis $ Friendships $ Recovered Total @@;
   datalines;
Anxious   Poor 0 0    Anxious   Good 13 21
Depressed Poor 0 8    Depressed Good 15 20
;

Notice that no children in the study are both anxious and have poor friendships and that no children who are depressed and have poor friendships have recovered. The following statements fit an unconditional logistic regression model to these data.

proc logistic data=one;
   class Diagnosis Friendships / param=ref;
   model Recovered/Total = Diagnosis Friendships;
run;

Because the data set has quasi-complete separation, the unconditional logistic regression results are not reliable and Output 72.12.1 is displayed.

Output 72.12.1: Unconditional Logistic Regression Results

The LOGISTIC Procedure

Model Convergence Status
Quasi-complete separation of data points detected.



The sparseness of the data and the separability of the data set make this a good candidate for an exact logistic regression. In the following code, the EXACTONLY option suppresses the unconditional logistic regression results, the EXACT statement requests an exact analysis of the two covariates, the OUTDIST= option outputs the exact distribution into a SAS data set, the JOINT option computes a joint test for the significance of the two covariates in the model, and the ESTIMATE option produces parameter estimates for the two covariates.

proc logistic data=one exactonly;
   class Diagnosis Friendships / param=ref;
   model Recovered/Total = Diagnosis Friendships;
   exact Diagnosis Friendships
       / outdist=dist joint estimate;
run;
proc print data=dist(obs=10);
run;
proc print data=dist(firstobs=162 obs=175);
run;
proc print data=dist(firstobs=176 obs=184);
run;

Tests for the joint significance of the Diagnosis and Friendships covariates are labeled "Joint" in Output 72.12.2. Both of these tests reject the null hypothesis that all the parameters are identically zero.

Output 72.12.2: Exact Tests

The LOGISTIC Procedure
 
Exact Conditional Analysis

Exact Conditional Tests
Effect Test Statistic p-Value
Exact Mid
Joint Score 13.1905 0.0008 0.0008
  Probability 0.000081 0.0007 0.0007
Diagnosis Score 0.7915 0.5055 0.4159
  Probability 0.1791 0.5055 0.4159
Friendships Score 12.4615 0.0004 0.0002
  Probability 0.000414 0.0004 0.0002



In Output 72.12.2, the joint probability statistic (0.000081) is the probability of the observed sufficient statistics (Friendships=28 and Diagnosis=13) in the joint exact distribution (part of which is displayed in Output 72.12.3). Note that the joint exact distribution has sufficient statistics displayed for both covariates; the marginal distributions have sufficient statistics displayed for only one covariate. The associated exact p-value (0.0007) in Output 72.12.2 is the sum of the probabilities of all sufficient statistics in the joint exact distribution that have probabilities less than or equal to 0.000081. The mid-p-value (0.0007) adjusts the exact p-value for the discreteness of the exact distribution by subtracting half the probability of the observed sufficient statistic (Hirji 2006). The score statistic (13.1905) is a weighted distance of the observed sufficient statistics from the mean sufficient statistics, and the score p-value (0.0008) is the sum of all probabilities in the Dist data set for sufficient statistics that are no closer to the mean.

Output 72.12.3: First 10 Observations in the Joint Exact Distribution

Obs DiagnosisAnxious FriendshipsGood Count Score Prob
1 0 20 1 48.0000 2.5608E-14
2 1 20 420 40.3905 1.0755E-11
3 1 21 168 40.6905 4.3022E-12
4 2 20 39900 33.5619 1.02177E-9
5 2 21 33600 33.4619 8.6044E-10
6 2 22 5880 34.7619 1.5058E-10
7 3 20 1516200 27.5143 3.88272E-8
8 3 21 2021600 27.0143 5.17696E-8
9 3 22 744800 27.9143 1.9073E-8
10 3 23 74480 30.2143 1.9073E-9



For the univariate tests of the Diagnosis variable, PROC LOGISTIC extracts the part of the joint exact distribution for which Friendships=28; this is displayed in Output 72.12.4. In addition, PROC LOGISTIC computes the probability and score tests in Output 72.12.2 in the same fashion as the joint tests. Both of the tests do not reject the null hypothesis that the parameter is zero.

Output 72.12.4: Marginal Exact Distribution for Diagnosis

Obs DiagnosisAnxious FriendshipsGood Count Score Prob
162 8 . 203490 17.6871 0.00001
163 9 . 5878600 12.5487 0.00033
164 10 . 67016040 8.2899 0.00380
165 11 . 402096240 4.9108 0.02282
166 12 . 1424090850 2.4113 0.08082
167 13 . 3154908960 0.7915 0.17905
168 14 . 4507012800 0.0513 0.25579
169 15 . 4206545280 0.1907 0.23874
170 16 . 2563363530 1.2098 0.14548
171 17 . 1005240600 3.1086 0.05705
172 18 . 245725480 5.8870 0.01395
173 19 . 35271600 9.5450 0.00200
174 20 . 2645370 14.0827 0.00015
175 21 . 77520 19.5000 0.00000



For the univariate tests of the Friendships variable, PROC LOGISTIC extracts the part of the exact distribution for which Diagnosis=13; this is displayed in Output 72.12.5. Both tests in Output 72.12.2 reject the null hypothesis that the parameter is zero.

Output 72.12.5: Marginal Exact Distribution for Friendships

Obs DiagnosisAnxious FriendshipsGood Count Score Prob
176 . 20 15774544800 9.3600 0.00207
177 . 21 205069082400 4.9985 0.02692
178 . 22 956989051200 1.9938 0.12560
179 . 23 2.1053759E12 0.3462 0.27633
180 . 24 2.3924726E12 0.0554 0.31401
181 . 25 1.4354836E12 1.1215 0.18841
182 . 26 441687254400 3.5446 0.05797
183 . 27 63098179200 7.3246 0.00828
184 . 28 3154908960 12.4615 0.00041



The parameter estimates are displayed in Output 72.12.6. Similar to the univariate tests, the parameter estimates are derived from the marginal exact distributions.

The Diagnosis parameter estimate (–0.5981) is computed by an iterative search for the parameter that maximizes the univariate conditional probability density function, as described in the section Inference for a Single Parameter. The Diagnosis parameter is not significantly different from zero.

Because the observed sufficient statistic for the Friendships parameter is on the edge of its distribution, the Friendships parameter estimate is the value for which the univariate conditional probability density function is equal to 0.5. In a similar fashion, the confidence limits for both the Diagnosis and Friendships parameters are created by finding values to make the two tail probabilities equal to 0.025.

Output 72.12.6: Exact Parameter Estimates

Exact Parameter Estimates
Parameter   Estimate   Standard
Error
95% Confidence Limits Two-sided p-Value
Diagnosis Anxious -0.5981   0.6760 -2.1970 0.9103 0.5737
Friendships Good 3.2612 * . 1.4948 Infinity 0.0008

Note: * indicates a median unbiased estimate.




To conclude, you find that the Friendships variable is a significant effect for explaining how mothers perceive the recovery of their children.

You can also use the exact parameter estimates to compute predicted probabilities for any data. This facility is not built into PROC LOGISTIC for exact logistic regression, because exact methods can be very expensive and the computations can fail. But this example is well behaved, so you can use the following statements to score the data:

proc logistic data=one exactonly outest=est;
   class Diagnosis Friendships / param=ref;
   model Recovered/Total = Diagnosis Friendships;
   exact Intercept Diagnosis Friendships / estimate;
run;
proc means data=est noprint;
   output out=out;
run;
data out; set out; if _STAT_='MEAN'; drop _TYPE_; run;
data est(type=est); set out; _TYPE_='PARMS'; run;

You specify the INTERCEPT keyword in the EXACT statement to compute an exact estimate for the intercept in addition to the other parameters. The parameter estimates are stored in the OUTEST= data set. Because there are both maximum-likelihood and median-unbiased estimates, the PROC MEANS statement accumulates the estimates into one observation, and then a TYPE=EST data set is formed.

The following program uses the scoring facility for unconditional logistic regression to score the original data set by using the exact parameter estimates:

proc logistic data=one inest=est;
   class Diagnosis Friendships / param=ref;
   model Recovered/Total = Diagnosis Friendships / maxiter=0;
   score out=score;
run;
proc print data=score;
   var Diagnosis Friendships P_Event;
run;

Output 72.12.7 shows that good friendships correspond to high recovery probabilities.

Output 72.12.7: Data Scored by Using Exact Parameter Estimates

Obs Diagnosis Friendships P_Event
1 Anxious Poor 0.04741
2 Anxious Good 0.56484
3 Depressed Poor 0.08300
4 Depressed Good 0.70243