Several common statistics that are defined for 2×2 tables, and which are not provided explictly by PROC FREQ, are discussed below. Point estimates for many of these statistics (sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), false positive probability, and false negative probability) are row or column percentages of the 2×2 table provided by PROC FREQ. Confidence intervals and tests can be obtained by using PROC FREQ on subtables or by using a modeling procedure to estimate the statistics. Note that PROC LOGISTIC computes sensitivity and specificity, but only for tables of predicted and observed classifications that result from fitting a logistic model.
The following hypothetical data assume subjects were observed to exhibit the response (such as a disease) or not. Subjects also tested either positive (Test=1) or negative (Test=0) on a prognostic test for the response. Results from all subjects can be summarized in a 2×2 table.
These statements read in the cell counts of the table and use PROC FREQ to display the table. PROC SORT orders the row and column variables so that 1 appears before 0. The ORDER=DATA option in PROC FREQ orders the table according to the order found in the sorted data set. As a result, the 1 levels appear before the 0 levels, putting Test=1, Response=1 in the upperleft (1,1) cell of the table.
data FatComp; input Test Response Count; datalines; 0 0 6 0 1 2 1 0 4 1 1 11 ; proc sort data=FatComp; by descending Test descending Response; run; proc freq data=FatComp order=data; weight Count; tables Test*Response; run;
Following are the results from PROC FREQ, with sensitivity, specificity, positive predictive value, negative predictive value, false positive probability, and false negative probability indicated by matching colors.
The FREQ Procedure

The statistics are defined as follows assuming that the table is arranged as shown with Response levels as the columns and Test levels as the rows and with Test=1, Response=1 as the 1,1 cell:
Confidence intervals and tests (asymptotic and exact) for these statistics can be computed using PROC FREQ by selecting the proper row or column from the original table. The BINOMIAL option in the TABLES statement provides asymptotic and exact confidence intervals and an asymptotic test that the proportion equals 0.5 (by default). The BINOMIAL option in the EXACT statement provides all of this plus an exact test of the proportion. You can test against a null value other than 0.5 by specifying P=value in parentheses after the BINOMIAL option. For example, BINOMIAL(P=0.75) tests against the null value of 0.75.
The following statements estimate and test each of the statistics as indicated in the TITLE statements. The WHERE statement is used to select the proper row or column for the statistic in each case. The use of LEVEL= in the BINOMIAL option selects the level of TEST or RESPONSE whose probability is estimated.
title 'Sensitivity'; proc freq data=FatComp; where Response=1; weight Count; tables Test / binomial(level="1"); exact binomial; run; title 'Specificity'; proc freq data=FatComp; where Response=0; weight Count; tables Test / binomial(level="0"); exact binomial; run; title 'Positive predictive value'; proc freq data=FatComp; where Test=1; weight Count; tables Response / binomial(level="1"); exact binomial; run; title 'Negative predictive value'; proc freq data=FatComp; where Test=0; weight Count; tables Response / binomial(level="0"); exact binomial; run; title 'False Positive Probability (Col)'; proc freq data=FatComp; where Response=0; weight Count; tables Test / binomial(level="1"); exact binomial; run; title 'False Positive Probability (Row)'; proc freq data=FatComp; where Test=1; weight Count; tables Response / binomial(level="0"); exact binomial; run; title 'False Negative Probability (Col)'; proc freq data=FatComp; where Response=1; weight Count; tables Test / binomial(level="0"); exact binomial; run; title 'False Negative Probability (Row)'; proc freq data=FatComp; where Test=0; weight Count; tables Response / binomial(level="1"); exact binomial; run;
Following are the results for sensitivity. Note that the estimate, 0.8462, is the same as shown above. An asymptotic confidence interval (0.65, 1) and an exact confidence interval (0.55, 0.98) for sensitivity are given. Also provided are asymptotic and exact one and twosided tests of the null hypothesis that sensitivity = 0.5.
The FREQ Procedure
Sample Size = 13

The accuracy can be computed by creating a binary variable (ACC) indicating whether test and response agree in each observation. As above, the BINOMIAL option in the TABLES and EXACT statements can be used to obtain asymptotic and exact tests and confidence intervals.
data acc; set FatComp; if (test and response) or (not test and not response) then acc=1; else acc=0; run; proc freq; weight count; tables acc / binomial(level="1"); exact binomial; run;
The accuracy is again found to be 0.7391 with a confidence interval of (0.56, 0.92). Asymptotic and exact tests of the null hypothesis that accuracy = 0.5 are similar and significant.

The lift values can be estimated in PROC GENMOD by fitting a loglinked binomial model with an offset to the data. This models the log of the positive response probabilities in the Test levels. By using the log of the overall probability of positive response as the offset, the log of the lift is modeled. The LSMEANS statement with the ILINK and CL options estimates the lift and provides a confidence interval and a test that the lift equals one.
data lift; set FatComp; off=log(13/23); run; proc genmod data=lift descending; freq count; class test; model response=test / dist=binomial link=log offset=off; lsmeans test / ilink cl; run;
In the results from the LSMEANS statement, the Estimate column contains the log lift estimates. The lift estimates appear in the Mean column and the confidence limits are in the Lower Mean and Upper Mean columns. The pvalue for the test that the lift equals one is in the Pr>z column.

The likelihood ratios, LR^{+} and LR^{}, can be easily computed from the sensitivity and specificity as described above. Since they can also be seen as nonlinear functions (ratios) of model parameters, they can be computed using the NLEstimate macro, which provides a confidence interval for each. PROC GENMOD is used to fit this linear probability model with TEST as the response and RESPONSE as a categorical predictor:
Pr(TEST=1) = β_{0}RESPONSE_{0} + β_{1}RESPONSE_{1} ,
where RESPONSE_{0} equals 1 if RESPONSE=0, and equals 0 otherwise, and RESPONSE_{1} equals 1 if RESPONSE=1, and equals 0 otherwise. Under this model, β_{1} is the sensitivity and β_{0} is 1specificity. When fitting the model in PROC GENMOD, include the STORE statement to save the model. Create a data set with an observation for each function to be estimated. See the description of the NLEstimate macro for details. Note that since two parameters are estimated, 21 = 23  2 degrees of freedom are specified.
proc genmod data=FatComp descending; freq count; class response test; model test = response / dist=binomial link=identity noint; store genfit; run; data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; LR+, b_p2/b_p1 LR, (1b_p2)/(1b_p1) ; %NLEstimate(instore=genfit, fdata=fd, df=21)
The point estimates of LR^{+} and LR^{} agree with the computations above (2.1154 and 0.2564 respectively). The 95% confidence interval for LR^{+} is (0.3339, 3.8968) and for LR^{} is (0.1168, 0.6296).

Computation of the attributable risk and population attributable risk (PAR) requires a data set of event counts and total counts for each population. In the above table, the Test levels are the populations and Response=1 is the event of interest. The TestCnts data set below contains the event counts (Count) and total counts (Total) for each Test population. Note that the population representing presence of the risk factor (Test=1) appears first. PROC STDRATE estimates the two risks by specifying the METHOD=MH(AF) and STAT=RISK options. In the POPULATION statement, the Test variable is identified as the GROUP= variable indicating the populations. The GROUP(EXPOSED="1")=Test option specifies that the Test=1 group is the exposed group. The event and total count variables are specified in the EVENT= and TOTAL= options.
data TestCnts; input Test Count Total; datalines; 1 11 15 0 2 8 ; proc stdrate data=TestCnts method=mh(af) stat=risk; population group(exposed="1")=Test event=Count total=Total; run;
The final table from PROC STDRATE presents the two risk estimates and their confidence intervals.

See also the example titled "Computing Attributable Fraction Estimates" in the STDRATE documentation and this note which discusses adjusting the estimates for covariates.
The number needed to treat (NNT) can be estimated in various ways. Since NNT is equal to the reciprocal of the risk difference, one way is to obtain the risk difference estimate and standard error from PROC FREQ and then use the delta method to obtain a standard error and confidence limits for NNT. Alternatively, a modeling approach^{Note} could be used by fitting a logistic model and estimating the appropriate nonlinear function of the logistic model parameters.
The PROC FREQ approach is shown below. Begin by obtaining the risk difference and its standard error from PROC FREQ. Since the table is arranged so that Test=1, Response=1 appears in the upperleft (1,1) cell of the table, the Column 1 risk difference is needed. The following ODS OUTPUT statement saves the Column 1 risk difference in a data set.
proc freq data=FatComp order=data; weight Count; tables Test*Response / riskdiff; ods output RiskdiffCol1=rd; run;
Note that the positive response probability for those positive on the prognostic test (TEST=1) is 0.7333, and is 0.25 for those negative on the test (TEST=0). The risk difference is then 0.7333  0.25 = 0.4833.

The following statements compute the estimate of the NNT and use the delta method to provide a (1α)100% confidence interval.
data nnt; set rd; where Row="Difference"; alpha=.05; NNT=1/risk; NNT_SE=ase/risk**2; *by delta method; LCL=nntprobit(1alpha/2)*nnt_se; UCL=nnt+probit(1alpha/2)*nnt_se; run; proc print; id table; var NNT NNT_SE LCL UCL; run;
The results show that a little over two subjects (2.0690) need to be treated, on average, to obtain one more positive response. A 95% large sample confidence interval for the NNT is (0.4666, 3.6713).

__________
Note: A modeling approach to estimating NNT can be taken using PROC LOGISTIC and the NLEstimate macro or by directly fitting the model in PROC NLMIXED and using its ESTIMATE statement. Both of these methods are further discussed and illustrated in this note.
The following statements fit a logistic model to the FatComp data and store the fitted model in an item store named Log. Similar to the example in this note, the risk at each Test level is written in terms of the model parameters and the reciprocal of the difference is specified in the the f= option of the NLEstimate macro for estimation. The parameters are referred to using names as described in the documentation for the macro.
proc logistic data=FatComp; freq Count; model Response(event="1")=Test; store Log; run; %NLEstimate(instore=Log, label=NNT, f=1/(logistic(B_p1+B_p2)logistic(B_p1)), df=21)
The same result can be obtained by fitting the model in PROC NLMIXED and estimating the same function of model parameters in the ESTIMATE statement. The PROC NLMIXED step below uses the TestCnts form of the FatComp data (see above) that is aggregated at the Test level. With this form of the data, the Counts are distributed as binomial.
proc nlmixed data=TestCnts; p=logistic(Intercept + b1*(Test=1)); model Count ~ binomial(Total,p); estimate "NNT" 1/(logistic(Intercept+b1)  logistic(Intercept)) df=21; run;
In both of the above examples, 21 degrees of freedom are used to form the confidence interval for the risk difference since there are 23 observations and 2 parameters are estimated in the model. This differs somewhat from the PROC FREQ approach used above which forms a large sample confidence interval.
Product Family  Product  System  SAS Release  
Reported  Fixed*  
SAS System  SAS/STAT  All  n/a 
Type:  Usage Note 
Priority:  low 
Topic:  SAS Reference ==> Procedures ==> FREQ Analytics ==> Exact Methods Analytics ==> Categorical Data Analysis Analytics ==> Descriptive Statistics SAS Reference ==> Procedures ==> STDRATE SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Macro 
Date Modified:  20190507 09:58:43 
Date Created:  20040923 15:44:42 