When fitting a model that includes the interaction of two predictors, it is often of interest to estimate the difference in the differences of means. For example, for a model containing two binary predictors, A and B each with levels 1 and 0, and their interaction
μ_{AB} = λ + αA + βB + γAB
you might want to estimate the difference in the effect of A at the two levels of B. This is the difference of the A means at level 1 of B minus the difference of the A means at level 0 of B. Writing this in terms of the above model parameters:
(μ_{11}  μ_{01})  (μ_{10}  μ_{00}) = [(λ + α + β + γ)  (λ + β)]  [(λ + α)  (λ)] = γ
You can see why this is called a "difference in differences" estimate. Note that the interaction parameter is the difference in difference estimate. But this is only the case when the model is an ordinary regression model, such as fit by PROC REG or PROC GLM, or equivalently a generalized linear model with identity link function, such as fit by PROC GENMOD or PROC GLIMMIX. For such a model, the fitting procedure directly provides the difference in differences of means estimate and a test of its significance in the Parameter Estimates table. It can also be produced using the ESTIMATE statement or the LSMESTIMATE statement. The LSMEANS statement shows the four mean estimates. With the LSMESTIMATE statement you can estimate a contrast of the means by specifying contrast coefficients using the order of the means presented by the LSMEANS statement.
proc genmod; class a b / ref=first; model y = a b a*b; estimate "Diff in Diff" a*b 1 1 1 1; lsmeans a*b; lsmestimate a*b "Diff in Diff" 1 1 1 1; run;
The interaction parameter and estimated difference in differences of means is about 21 and is significantly different from zero (p<0.0001). The same estimate is provided by the ESTIMATE and LSMESTIMATE statements. The means at each of the four AB combinations are given by the LSMEANS statement. Using those values, you can see that 21 ≅ (5040)(6938). A confidence interval for the difference in differences of means is (24.3, 17.7). Adding the CL option in the LSMEANS and LSMESTIMATE statements adds confidence intervals in the results from those statements.

Suppose that instead of an ordinary regression model, you are interested in estimating the difference in differences of means (probabilities) for a binary logistic model
LogOdds_{AB} = λ + αA + βB + γAB
In this model, or indeed with any generalized linear model, the interaction is still a difference in differences estimate — but not of the means, but rather of the means transformed by the link function. In the case of a logistic model, it estimates the difference in differences of log odds (logits). The difference in differences of means requires that each of the four parts of the estimator be a mean rather than a log odds, and that requires applying the inverse of the link function to each of the four parts. The LOGISTIC function in SAS^{®} is the inverse of the logit link function 1/(1+e^{LogOdds})
[logistic(λ + α + β + γ)  logistic(λ + β)]  [logistic(λ + α)  logistic(λ)]
Because this is not a linear combination of the model parameters or of the LSmeans, you cannot use the ESTIMATE or LSMESTIMATE statements to estimate the difference in differences of means. Though you can use them, just as above, to estimate the difference in differences of log odds. The ILINK option in the LSMEANS statement below applies the inverse link function to the four individual log odds estimates, resulting in a Mean column showing the mean (probability) estimates. The ILINK option only applies the inverse link to the entire estimate, so it cannot be used when differences (in LSMEANS) or contrasts (in LSMESTIMATE) are requested. In the following statements, the difference in differences estimate on the log odds scale appears as the interaction parameter estimate as well as from the ESTIMATE and LSMESTIMATE statements. Note that the LSMEANS and LSMESIMATE statements require that CLASS variables use the GLM parameterization (PARAM=GLM).
proc logistic; class a b / param=glm ref=first; model y(event="1") = a b a*b; estimate "Diff in Diff" a*b 1 1 1 1; lsmeans a*b / e ilink; ods output coef=coeffs; lsmestimate a*b "Diff in Diff LogOdds" 1 1 1 1; store log; run;
The difference in differences estimate of log odds is 1.1113 ≅ [0.12(0.16)][0.95(0.45)].

The estimator of the difference in differences of means shown above for this logistic model is a nonlinear combination of model parameters. Such an estimate can be computed using the NLMeans macro, the NLEstimate macro, or by fitting the model in PROC NLMIXED and using its ESTIMATE statement. The mean estimates are also considered predictive margins. The difference in difference is a contrast of these margins which can be estimated by the Margins macro. In addition to using these to estimate the difference in difference of means, they can also be used to estimate pairwise differences of means as illustrated in this note.
Using the Margins macro
The Margins macro can estimate and test the predictive margins (means) of the A,B combinations. It first fits the model that you specify. In the macro call shown below, the specification of the response=, roptions=, class=, model=, and dist= options reproduce the logistic model as done above in PROC LOGISTIC. Estimates of the A,B margins are requested by margins=a b. The macro fits the model using the standard order of the A,B combinations and displays the margins in that order: A0B0, A0B1, A1B0, A1B1. Confidence intervals for the margins and the contrast row estimates are requested by options=cl.
The DATA step preceding the Margins call creates a data set that specifies the desired contrasts of the A,B margins. This data set is specified in contrasts= and must contain the LABEL and F character variables. The first contrast, labeled A1A0, defines a two row contrast matrix. Each row defines an A1A0 difference – first in B1, then in B2. The second contrast defines the difference in difference, (A1B1A0B1)(A1B0A0B0).
data c; length label f $32767; infile datalines delimiter=''; input label f; datalines; A1A0  0 1 0 1, 1 0 1 0 Diff in Diff of Means  1 1 1 1 ; %Margins(data = a, response = y, roptions = event='1', class = a b, model = ab, dist = binomial, margins = a b, contrasts= c, options = cl)
The Predictive Margins table reproduces the mean estimates shown above from the LSMEANS/ILINK statement in PROC LOGISTIC. The Contrasts table provides, for each of the two contrasts, a joint chisquare test and an estimate and test for each row in the contrast. The A1A0 difference in B1 is 0.07 and is not significant (p=0.3210). The A1A0 difference in B2 is 0.33 and is significant (p<0.0001). The joint test of the A difference with 2 degrees of freedom is also significant (p<0.0001). Finally, the estimated difference in difference is 0.26 and is significant (p=0.0072).
Using the NLMeans macro
While the difference in difference contrast of means cannot be estimated by the ESTIMATE statement as noted above, it can also be estimated using the NLMeans macro. To use the macro, you need to supply the saved model from the STORE statement and a data set of coefficients that define the individual LSmeans. This coefficients data set is made available by the E option in the LSMEANS statement and is saved by the ODS OUTPUT statement shown above. Finally, create a data set containing the desired contrast of means (or contrasts, if you want to estimate several). The data set must contain variables named SET and K1, K2, ... , Kn, where n is the number of means estimated by the LSMEANS statement. In this case, n=4. The SET variable is primarily used when multiple sets of means are estimated by the SLICE statement or by multiple LSMEANS, SLICE, or ESTIMATE statements. Since there is only a single LSMEANS statement and therefore only a single set of LSmeans, SET=1.
These statements then create the data set containing the difference in difference contrasts and call the NLMeans macro. Note that you need to specify the link function used in the fitted model which is the logit function for a logistic model.
data difdif; input k1k4; set=1; datalines; 1 1 1 1 ; %NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif, title=Difference in Difference of Means)
The estimated difference in differences of means (probabilities) is 0.26 ≅ (0.530.46)(0.720.39) with largesample confidence interval (0.45, 0.07).

Using the NLEstimate macro
Next, the NLEstimate macro is used. The model saved using the STORE statement is again used in the macro. The expression for the difference in differences of means estimate can be directly specified in the f= option. The parameter names that the macro uses are B_px, where x=1, 2, 3,... in the order shown in the Parameter Estimates table. The Intercept is B_p1. The A parameter is B_p2. B_p4 is the B parameter, and the interaction parameter is B_p6.
%NLEstimate(instore=log, label=Diff in Diff Means, f=( logistic(b_p1+b_p2+b_p4+b_p6)  logistic(b_p1+b_p4) )  ( logistic(b_p1+b_p2)  logistic(b_p1) ), title=Difference in Difference of Means)
The results are the same as from the NLMeans macro above.

The Margins, NLMeans, or NLEstimate macro can similarly be used to estimate the difference in differences of means for generalized linear models that use other link functions. For example, the log link is commonly used for modeling count responses in Poisson and negative binomial models. It is also typical in gamma models for positive, continuous responses. The difference in differences of means estimate for loglinked models is done as for the logistic model above, but specifying the log link in the NLMeans macro. With the NLEstimate macro, the EXP function is used in the f= expression rather than the LOGISTIC function since exponentiation is the inverse of the log.
Product Family  Product  System  SAS Release  
Reported  Fixed*  
SAS System  SAS/STAT  z/OS  
z/OS 64bit  
OpenVMS VAX  
Microsoft® Windows® for 64Bit Itaniumbased Systems  
Microsoft Windows Server 2003 Datacenter 64bit Edition  
Microsoft Windows Server 2003 Enterprise 64bit Edition  
Microsoft Windows XP 64bit Edition  
Microsoft® Windows® for x64  
OS/2  
Microsoft Windows 8 Enterprise 32bit  
Microsoft Windows 8 Enterprise x64  
Microsoft Windows 8 Pro 32bit  
Microsoft Windows 8 Pro x64  
Microsoft Windows 8.1 Enterprise 32bit  
Microsoft Windows 8.1 Enterprise x64  
Microsoft Windows 8.1 Pro 32bit  
Microsoft Windows 8.1 Pro x64  
Microsoft Windows 10  
Microsoft Windows 95/98  
Microsoft Windows 2000 Advanced Server  
Microsoft Windows 2000 Datacenter Server  
Microsoft Windows 2000 Server  
Microsoft Windows 2000 Professional  
Microsoft Windows NT Workstation  
Microsoft Windows Server 2003 Datacenter Edition  
Microsoft Windows Server 2003 Enterprise Edition  
Microsoft Windows Server 2003 Standard Edition  
Microsoft Windows Server 2003 for x64  
Microsoft Windows Server 2008  
Microsoft Windows Server 2008 R2  
Microsoft Windows Server 2008 for x64  
Microsoft Windows Server 2012 Datacenter  
Microsoft Windows Server 2012 R2 Datacenter  
Microsoft Windows Server 2012 R2 Std  
Microsoft Windows Server 2012 Std  
Microsoft Windows Server 2016  
Microsoft Windows XP Professional  
Windows 7 Enterprise 32 bit  
Windows 7 Enterprise x64  
Windows 7 Home Premium 32 bit  
Windows 7 Home Premium x64  
Windows 7 Professional 32 bit  
Windows 7 Professional x64  
Windows 7 Ultimate 32 bit  
Windows 7 Ultimate x64  
Windows Millennium Edition (Me)  
Windows Vista  
Windows Vista for x64  
64bit Enabled AIX  
64bit Enabled HPUX  
64bit Enabled Solaris  
ABI+ for Intel Architecture  
AIX  
HPUX  
HPUX IPF  
IRIX  
Linux  
Linux for x64  
Linux on Itanium  
OpenVMS Alpha  
OpenVMS on HP Integrity  
Solaris  
Solaris for x64  
Tru64 UNIX 
Type:  Usage Note 
Priority:  
Topic:  Analytics ==> Regression Analytics ==> Transformations SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> GLIMMIX SAS Reference ==> Procedures ==> GLM SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> MIXED SAS Reference ==> Procedures ==> ORTHOREG SAS Reference ==> Macro 
Date Modified:  20181015 10:11:03 
Date Created:  20180209 15:20:07 