The following data, from Hand et al. (1994), contain the results of a study of 49 anxious or depressed children. The Diagnosis
variable indicates whether the child was anxious or depressed when the study began, the Friendships
variable indicates whether the child has good friendships, the Total
variable represents the total number of children in the study who exhibit the specified values for Diagnosis
and Friendships
, and the Recovered
variable represents the number of children whose mothers believe that their child has recovered at the end of the study.
data one; length Diagnosis $ 9; input Diagnosis $ Friendships $ Recovered Total @@; datalines; Anxious Poor 0 0 Anxious Good 13 21 Depressed Poor 0 8 Depressed Good 15 20 ;
Notice that no children in the study are both anxious and have poor friendships and that no children who are depressed and have poor friendships have recovered. The following statements fit an unconditional logistic regression model to these data.
proc logistic data=one; class Diagnosis Friendships / param=ref; model Recovered/Total = Diagnosis Friendships; run;
Because the data set has quasi-complete separation, the unconditional logistic regression results are not reliable and Output 72.12.1 is displayed.
Output 72.12.1: Unconditional Logistic Regression Results
The sparseness of the data and the separability of the data set make this a good candidate for an exact logistic regression. In the following code, the EXACTONLY option suppresses the unconditional logistic regression results, the EXACT statement requests an exact analysis of the two covariates, the OUTDIST= option outputs the exact distribution into a SAS data set, the JOINT option computes a joint test for the significance of the two covariates in the model, and the ESTIMATE option produces parameter estimates for the two covariates.
proc logistic data=one exactonly; class Diagnosis Friendships / param=ref; model Recovered/Total = Diagnosis Friendships; exact Diagnosis Friendships / outdist=dist joint estimate; run;
proc print data=dist(obs=10); run;
proc print data=dist(firstobs=162 obs=175); run;
proc print data=dist(firstobs=176 obs=184); run;
Tests for the joint significance of the Diagnosis
and Friendships
covariates are labeled "Joint" in Output 72.12.2. Both of these tests reject the null hypothesis that all the parameters are identically zero.
Output 72.12.2: Exact Tests
Exact Conditional Tests | ||||
---|---|---|---|---|
Effect | Test | Statistic | p-Value | |
Exact | Mid | |||
Joint | Score | 13.1905 | 0.0008 | 0.0008 |
Probability | 0.000081 | 0.0007 | 0.0007 | |
Diagnosis | Score | 0.7915 | 0.5055 | 0.4159 |
Probability | 0.1791 | 0.5055 | 0.4159 | |
Friendships | Score | 12.4615 | 0.0004 | 0.0002 |
Probability | 0.000414 | 0.0004 | 0.0002 |
In Output 72.12.2, the joint probability statistic (0.000081) is the probability of the observed sufficient statistics (Friendships
=28 and Diagnosis
=13) in the joint exact distribution (part of which is displayed in Output 72.12.3). Note that the joint exact distribution has sufficient statistics displayed for both covariates; the marginal distributions
have sufficient statistics displayed for only one covariate. The associated exact p-value (0.0007) in Output 72.12.2 is the sum of the probabilities of all sufficient statistics in the joint exact distribution that have probabilities less
than or equal to 0.000081. The mid-p-value (0.0007) adjusts the exact p-value for the discreteness of the exact distribution by subtracting half the probability of the observed sufficient statistic
(Hirji 2006). The score statistic (13.1905) is a weighted distance of the observed sufficient statistics from the mean sufficient statistics,
and the score p-value (0.0008) is the sum of all probabilities in the Dist
data set for sufficient statistics that are no closer to the mean.
Output 72.12.3: First 10 Observations in the Joint Exact Distribution
Obs | DiagnosisAnxious | FriendshipsGood | Count | Score | Prob |
---|---|---|---|---|---|
1 | 0 | 20 | 1 | 48.0000 | 2.5608E-14 |
2 | 1 | 20 | 420 | 40.3905 | 1.0755E-11 |
3 | 1 | 21 | 168 | 40.6905 | 4.3022E-12 |
4 | 2 | 20 | 39900 | 33.5619 | 1.02177E-9 |
5 | 2 | 21 | 33600 | 33.4619 | 8.6044E-10 |
6 | 2 | 22 | 5880 | 34.7619 | 1.5058E-10 |
7 | 3 | 20 | 1516200 | 27.5143 | 3.88272E-8 |
8 | 3 | 21 | 2021600 | 27.0143 | 5.17696E-8 |
9 | 3 | 22 | 744800 | 27.9143 | 1.9073E-8 |
10 | 3 | 23 | 74480 | 30.2143 | 1.9073E-9 |
For the univariate tests of the Diagnosis
variable, PROC LOGISTIC extracts the part of the joint exact distribution for which Friendships
=28; this is displayed in Output 72.12.4. In addition, PROC LOGISTIC computes the probability and score tests in Output 72.12.2 in the same fashion as the joint tests. Both of the tests do not reject the null hypothesis that the parameter is zero.
Output 72.12.4: Marginal Exact Distribution for Diagnosis
Obs | DiagnosisAnxious | FriendshipsGood | Count | Score | Prob |
---|---|---|---|---|---|
162 | 8 | . | 203490 | 17.6871 | 0.00001 |
163 | 9 | . | 5878600 | 12.5487 | 0.00033 |
164 | 10 | . | 67016040 | 8.2899 | 0.00380 |
165 | 11 | . | 402096240 | 4.9108 | 0.02282 |
166 | 12 | . | 1424090850 | 2.4113 | 0.08082 |
167 | 13 | . | 3154908960 | 0.7915 | 0.17905 |
168 | 14 | . | 4507012800 | 0.0513 | 0.25579 |
169 | 15 | . | 4206545280 | 0.1907 | 0.23874 |
170 | 16 | . | 2563363530 | 1.2098 | 0.14548 |
171 | 17 | . | 1005240600 | 3.1086 | 0.05705 |
172 | 18 | . | 245725480 | 5.8870 | 0.01395 |
173 | 19 | . | 35271600 | 9.5450 | 0.00200 |
174 | 20 | . | 2645370 | 14.0827 | 0.00015 |
175 | 21 | . | 77520 | 19.5000 | 0.00000 |
For the univariate tests of the Friendships
variable, PROC LOGISTIC extracts the part of the exact distribution for which Diagnosis
=13; this is displayed in Output 72.12.5. Both tests in Output 72.12.2 reject the null hypothesis that the parameter is zero.
Output 72.12.5: Marginal Exact Distribution for Friendships
Obs | DiagnosisAnxious | FriendshipsGood | Count | Score | Prob |
---|---|---|---|---|---|
176 | . | 20 | 15774544800 | 9.3600 | 0.00207 |
177 | . | 21 | 205069082400 | 4.9985 | 0.02692 |
178 | . | 22 | 956989051200 | 1.9938 | 0.12560 |
179 | . | 23 | 2.1053759E12 | 0.3462 | 0.27633 |
180 | . | 24 | 2.3924726E12 | 0.0554 | 0.31401 |
181 | . | 25 | 1.4354836E12 | 1.1215 | 0.18841 |
182 | . | 26 | 441687254400 | 3.5446 | 0.05797 |
183 | . | 27 | 63098179200 | 7.3246 | 0.00828 |
184 | . | 28 | 3154908960 | 12.4615 | 0.00041 |
The parameter estimates are displayed in Output 72.12.6. Similar to the univariate tests, the parameter estimates are derived from the marginal exact distributions.
The Diagnosis
parameter estimate (–0.5981) is computed by an iterative search for the parameter that maximizes the univariate conditional
probability density function, as described in the section Inference for a Single Parameter. The Diagnosis
parameter is not significantly different from zero.
Because the observed sufficient statistic for the Friendships
parameter is on the edge of its distribution, the Friendships
parameter estimate is the value for which the univariate conditional probability density function is equal to 0.5. In a similar
fashion, the confidence limits for both the Diagnosis
and Friendships
parameters are created by finding values to make the two tail probabilities equal to 0.025.
Output 72.12.6: Exact Parameter Estimates
To conclude, you find that the Friendships
variable is a significant effect for explaining how mothers perceive the recovery of their children.
You can also use the exact parameter estimates to compute predicted probabilities for any data. This facility is not built into PROC LOGISTIC for exact logistic regression, because exact methods can be very expensive and the computations can fail. But this example is well behaved, so you can use the following statements to score the data:
proc logistic data=one exactonly outest=est; class Diagnosis Friendships / param=ref; model Recovered/Total = Diagnosis Friendships; exact Intercept Diagnosis Friendships / estimate; run; proc means data=est noprint; output out=out; run; data out; set out; if _STAT_='MEAN'; drop _TYPE_; run; data est(type=est); set out; _TYPE_='PARMS'; run;
You specify the INTERCEPT keyword in the EXACT statement to compute an exact estimate for the intercept in addition to the other parameters. The parameter estimates are stored in the OUTEST= data set. Because there are both maximum-likelihood and median-unbiased estimates, the PROC MEANS statement accumulates the estimates into one observation, and then a TYPE=EST data set is formed.
The following program uses the scoring facility for unconditional logistic regression to score the original data set by using the exact parameter estimates:
proc logistic data=one inest=est; class Diagnosis Friendships / param=ref; model Recovered/Total = Diagnosis Friendships / maxiter=0; score out=score; run;
proc print data=score; var Diagnosis Friendships P_Event; run;
Output 72.12.7 shows that good friendships correspond to high recovery probabilities.
Output 72.12.7: Data Scored by Using Exact Parameter Estimates