We typically think of a predictor variable, X, causing a response variable, Y. But some or all of the effect of X might result from an intermediary variable, M, that is said to mediate the effect of X on Y. By fitting appropriate models and making certain causal assumptions (Kenny, 2016), it is possible to measure the direct effect of X and the indirect effect effect of X through M.
Beginning in SAS® 9.4 TS1M5, causal mediation analysis is available in PROC CAUSALMED in SAS/STAT® software. In some limited situations, linear structural equation modeling via PROC CALIS can also be used as discussed and illustrated in an example in the CAUSALMED documentation, but that approach does not extend to more general situations.
Mediation is an area of growing and active research. There are many issues such as covariates, multiple mediators, moderation, latent variables, bootstrapping the indirect effect (for p-values and confidence intervals), and more which complicate the analysis beyond the simple situation and analysis illustrated here. Detailed discussion of mediation analysis in simple and complex situations, including sufficient assumptions, can be found in many places such as the references provided below and the additional references they provide.
Key steps in mediation analysis include a model of the mediator as a function of the predictor (the M←X model) and a model of the response as a function of both the mediator and the predictor (the Y←MX model). If the effect of X in the first model and the effect of M in the second model are both significant, then there is evidence of a nonzero indirect effect (Kenny, 2016). When the response and mediator are both continuous and approximately normal, ordinary regression can be used for these models. When either the response, Y, or the mediator, M, is binary, the corresponding model is typically a logistic model. In either case, the final estimates of direct and indirect effects of the predictor of interest are standardized estimates from these models.
Two examples are presented below. In the first, all three variables are continuous. In the second, Y and X are binary and there are two mediators – one is continuous, one is binary.
This example (Kenny, 2016) reproduces the results from a mediation analysis on housing data. The response variable Stable_Housing is the days housed. The predictor is Treatment. The variable Hous_Conts (housing contacts) is the mediator. Of interest is whether the effect of Treatment is mediated by housing contacts. The mediation analysis is done below using PROC CAUSALMED, and is reproduced by fitting the models separately and computing the estimates. Note that PROC CAUSALMED offers bootstrapped standard errors and other details not easily reproduced by the second approach.
Since Kenny presents results using standardized estimates, PROC STANDARD is used to standardize the data prior to analysis. The standardized data are saved in data set StdHousing. If results using unstandardized estimates are desired, the unstandardized data can be used. The following statements standardize the variables used in the analysis.
proc standard data=housing out=StdHousing m=0 s=1; var stable_housing treatment hous_conts; run;
The following statements perform the mediation analysis using PROC CAUSALMED. The MODEL statement specifies the response as a function of both the Treatment predictor and the mediator (the Y←X model). The MEDIATOR statement specifies the mediator as a function of the Treatment predictor (the M←X model). The BOOTSTRAP statement optionally requests computation of bootstrapped standard errors and confidence limits.
proc causalmed data=StdHousing; model Stable_Housing = Treatment Hous_Conts; mediator Hous_Conts = Treatment; bootstrap; run;
The estimated direct effect of Treatment is 0.15 and the indirect effect mediated by housing contacts is 0.097. Large-sample (Wald) and bootstrap confidence intervals are provided as well as p-values. Notice that the indirect effect is significant (p=0.0254). 39% of the total effect of treatment is mediated by housing contacts.
|
The same analysis can be done by fitting the models described above and using the resulting parameter estimates to compute the effects.
The first model (the Y←X model) uses treatment to predict the response. The ODS OUTPUT statement saves it in data set COMPC. This estimate is the total effect of X ignoring mediation.
proc reg data=StdHousing plots=none; model Stable_Housing = Treatment; ods output parameterestimates=compc(keep=variable estimate where=(variable="treatment")); run; quit;
Next, the M←X model is fitted. The parameter estimate for Treatment and its standard error are saved using an ODS OUTPUT statement.
proc reg data=StdHousing plots=none; model Hous_Conts = Treatment; ods output parameterestimates=compa(keep=variable estimate stderr where=(variable="treatment")); run; quit;
The final model (the Y←MX model) fits the response as a function of the mediator and predictor. As before, the estimates and standard errors are saved.
proc reg data=StdHousing plots=none; model Stable_Housing = Treatment Hous_Conts; ods output parameterestimates=compbcp(keep=variable estimate stderr where=(variable in ("treatment","hous_conts"))); run; quit;
The significance of the predictor in the M←X model and of the mediator in the Y←MX model suggest that the indirect effect is nonzero. The Sobel test can also be used to test the indirect effect when both models are ordinary regression models. However, the Sobel test is known to be very conservative. The following statements produce the Sobel test.
data sobel; merge compa(drop=variable rename=(estimate=a stderr=sa)) compbcp(where=(variable="hous_conts") rename=(estimate=b stderr=sb)); sobel=a*b/sqrt(a**2*sb**2 + b**2*sa**2); p=2*(1-probnorm(sobel)); run; proc print data=sobel label noobs; var sobel p; format p pvalue6.; label sobel="Sobel Test" p="Pr>|Z|"; run;
The significant result (p=0.0269) indicates a nonzero indirect effect. Note that this closely agrees with the test of the indirect effect provided by PROC CAUSALMED.
|
The following steps combine all of the estimates into one data set and compute the indirect, direct, and total effects of Treatment. The proportion of the total effect represented by the indirect effect, and the ratio of indirect to direct effect are also computed.
proc transpose data=compbcp out=compbcp2; var estimate; id variable; run; data parts; merge compc(rename=(estimate=c)) compa(rename=(estimate=a)) compbcp2(rename=(treatment=cprime hous_conts=b)); indirect=a*b; directeff=cprime; totaleff=indirect+directeff; propmed=indirect/totaleff; ratioItoD=indirect/directeff; Predictor="Treatment"; label indirect="Indirect effect of treatment mediated by housing contacts" directeff="Direct effect of treatment" totaleff="Total effect (direct+indirect) of treatment" c="Total effect (ignoring mediator) of treatment" propmed="Proportion of total effect mediated" ratioItoD="Ratio of indirect to direct"; run; proc print data=parts label; id Predictor; var indirect directeff totaleff c propmed ratioItoD; run;
The results closely match those from PROC CAUSALMED. Note that when all models are fit using ordinary regression as they are here, the two estimates of the total effect of treatment are the same. However, bootstrapped standard errors and confidence intervals are not easily obtained by this approach.
|
This example (UCLA) reproduces the results from an analysis of data from 200 high school students. The data includes demographic variables, scores on tests for reading, writing, math, science and an binary indicator of enrollment in an honors program. Honors enrollment (honors=1) is the response, Y, and SES as the continuous predictor of interest, X. A binary variable, hiread, is created to indicate high (1) vs. low (0) reading score and is mediator M1. The science score, M2, is another mediator that is continuous.
Since two mediators are involved, the method employed by PROC CAUSALMED cannot be used. The following provides an analysis by a different method.
data a; set hsbdemo; hiread=(read >= 50); run;
These statements fit the logistic Y←X model that predicts honors using only SES, ignoring mediation. The SES parameter is saved by the OUTEST= option and named C. Predicted logits are also saved and their variance is computed in PROC MEANS. This, and subsequent variance estimates, will be needed to standardize the parameters.
proc logistic data=a outest=yx(rename=(ses=c)); model honors(event="1") = ses; output out=outyx xbeta=pyx; run; proc means data=outyx noprint; var pyx; output out=vyx(keep=vy1) var=vy1; run;
Next, each mediator is modeled as a function of SES. The M1←X model for the binary mediator, hiread, is modeled first using logistic regression. Again, the SES parameter is saved and the variance of the logits is computed and saved.
proc logistic data=a outest=m1x(rename=(ses=ah)); model hiread(event="1") = ses; output out=outm1x xbeta=pm1x; run; proc means data=outm1x noprint; var pm1x; output out=vm1x(keep=vm11) var=vm11; run;
The M2←X model for the continuous mediator, science, is modeled next and the standardized estimate for SES is saved.
proc reg data=a plots=none; model science = ses / stb; ods output parameterestimates=compas(keep=variable standardizedest where=(variable="SES")); run; quit;
These statements fit the joint Y←M1M2X model. The parameter estimates and variance of the logits are saved.
proc logistic data=a outest=ym12x(rename=(ses=cprime hiread=bh science=bs)); model honors(event="1") = hiread science ses; output out=outym12x xbeta=pym12x; run; proc means data=outym12x noprint; var pym12x; output out=vym12x(keep=vypp1) var=vypp1; run;
As in the previous example, the significance of the predictor in both the M1←X and M2←X models, and of the mediators in the Y←M1M2X model suggest that the indirect effect is nonzero.
In addition to the variances computed above, the variances of the mediators and the predictor are needed. These are computed and saved in the following step.
proc means data=a noprint; var hiread science ses; output out=sd(drop=_type_) stddev=sdh sdsc sdses; run;
These steps compute the standardized estimates which are needed to obtain the direct, indirect, and total effects of SES on the response.
data parts; merge m1x yx ym12x sd vm1x compas vyx vym12x; sdyprime=sqrt(vy1 + (constant("pi")**2)/3); sdm1prime=sqrt(vm11 + (constant("pi")**2)/3); sdyprime2=sqrt(vypp1 + (constant("pi")**2)/3); compah=ah*sdses/sdm1prime; compas=standardizedest; compbh=bh*sdh/sdyprime2; compbs=bs*sdsc/sdyprime2; compc=c*sdses/sdyprime; compcprime=cprime*sdses/sdyprime2; indirecth=compah*compbh; indirects=compas*compbs; totindirect=indirecth+indirects; totaleff=totindirect+compcprime; propmed=totindirect/totaleff; ratioItoD=totindirect/compcprime; Predictor="SES"; label indirecth="Indirect effect of SES mediated by HIREAD" indirects="Indirect effect of SES mediated by SCIENCE" totindirect="Total indirect (mediated) effect of SES" compcprime="Direct effect of SES" totaleff="Total effect (direct+indirect) of SES" compc="Total effect (ignoring mediators) of SES" propmed="Proportion of total effect mediated" ratioItoD="Ratio of indirect to direct"; run; proc print data=parts label; id Predictor; var indirecth indirects totindirect compcprime totaleff compc propmed ratioItoD; run;
The direct effect of SES on the response is found to be 0.076, while the indirect effect through the mediators is 0.197, meaning that 72% of the total effect of SES is mediated. Unlike the case above in which only ordinary regression is needed for all models, when logistic models are involved, the total effects involving vs. ignoring the mediators are not exactly the same.
|
References and resources
Kenny, D.A., "Mediation" website (as of Sep2016). He provides more extensive description of mediation, its assumptions, and gives many additional references.
Herr, N.R., "Mediation with Dichotomous Outcomes" website (as of Sep2016).
Kenny D.A., Kashy D.A., and Bolger N. (1998), "Data analysis in social psychology" in Gilbert D., Fiske S., and Lindzey G. (Eds.), The handbook of social psychology, Vol. 1, 4th ed., pp. 233-265. Boston, MA: McGraw-Hill.
"How can I perform mediation with binary variables?" (as of Sep2016), UCLA: Statistical Consulting Group.
Hayes, Papers and SAS macro on mediation analysis.
UCLA Institute for Digital Research and Education, "How can I perform mediation with binary variables?".
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
z/OS 64-bit | ||||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 8 Enterprise 32-bit | ||||
Microsoft Windows 8 Enterprise x64 | ||||
Microsoft Windows 8 Pro 32-bit | ||||
Microsoft Windows 8 Pro x64 | ||||
Microsoft Windows 8.1 Enterprise 32-bit | ||||
Microsoft Windows 8.1 Enterprise x64 | ||||
Microsoft Windows 8.1 Pro 32-bit | ||||
Microsoft Windows 8.1 Pro x64 | ||||
Microsoft Windows 10 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 R2 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows Server 2012 Datacenter | ||||
Microsoft Windows Server 2012 R2 Datacenter | ||||
Microsoft Windows Server 2012 R2 Std | ||||
Microsoft Windows Server 2012 Std | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Regression SAS Reference ==> Procedures ==> CAUSALMED |
Date Modified: | 2019-05-07 15:48:56 |
Date Created: | 2016-10-04 13:31:48 |