In addition to the following example, several more examples of using the Margins macro can be found in these notes:
- EXAMPLE 1: Predictive margins in a binary logistic model
- The following statements estimate and test predictive margins for the Neuralgia data set presented in the example titled "Logistic Modeling with Categorical Predictors" in the LOGISTIC documentation (SAS Note 22930). Treatment predictive margins are computed for males and females. Since the Margins macro requires a numeric response variable, the NoPain variable is created with value 1 when Pain='No' and 0 otherwise. The probability of no pain is modeled as a result of roptions=event='1'. Confidence intervals are requested by options=cl.
data Neur;
set Neuralgia;
NoPain=(pain='No');
run;
%Margins(data = Neur,
class = Treatment Sex,
response = NoPain,
roptions = event='1',
dist = binomial,
model = Treatment Sex Treatment*Sex Age Duration,
margins = Treatment,
at = Sex,
options = cl)
The predictive margins are presented in the following tables. For example, the estimated probability of no pain for females in Treatment A is 0.82, and 0.62 for males. Notice that Treatment P produces a noticeably smaller probability than Treatments A or B, and females have higher probabilities in all Treatments.
Adding the atmeans option fixes all other predictors in the model at their mean values. With this option, all predictors in the model are fixed either by margins= or atmeans. The nomodel option prevents displaying the fitted model again.
%Margins(data = Neur,
class = Treatment Sex,
response = NoPain,
roptions = event='1',
dist = binomial,
model = Treatment Sex Treatment*Sex Age Duration,
margins = Treatment,
at = Sex,
options = cl nomodel atmeans)
Following are the predictive margins at the means of Age and Duration. The mean values are shown preceding the margins table. The margins at the means are roughly similar to the previous values.
In the analysis with options=atmeans, all predictors are fixed. In this case, the same results can be obtained using the LSMEANS statement in PROC LOGISTIC as shown below.
proc logistic data=Neur;
class Treatment Sex / param=glm;
model NoPain(event='1') = Treatment Sex Treatment*Sex Age Duration;
lsmeans Treatment*Sex / ilink cl at means;
run;
The Mean column displays the predictive margins, which match those in the previous analysis from the Margins macro.
Differences among the treatment predictive margins, computed within the sexes, can be estimated and tested by adding diff=all. This can be done with or without fixing the other model predictors at their means with the atmeans option.
%Margins(data = Neur,
class = Treatment Sex,
response = NoPain,
roptions = event='1',
dist = binomial,
model = Treatment Sex Treatment*Sex Age Duration,
margins = Treatment,
at = Sex,
diff = all,
options = cl nomodel)
The results show that both the A and B Treatments produce significantly higher probability than Treatment P, in both sexes. Treatments A and B do not differ significantly.
- EXAMPLE 2: Marginal effects in a binary logistic model
- Using the same data as the previous example, the following estimates the marginal effect for Sex at the means of Treatment, Age and Duration. Since Sex is a binary CLASS variable, its marginal effect is computed as the difference in predictive margins. Since Treatment is also a CLASS variable, its mean is represented in the model by using the overall observed proportions of the Treatments in its dummy variables.
%Margins(data = Neur,
class = Treatment Sex,
response = NoPain,
roptions = event='1',
dist = binomial,
model = Treatment Sex Treatment*Sex Age Duration,
margins = Sex,
diff = all,
options = cl nomodel atmeans)
The Treatment, Age, and Duration means used in the computations are shown first. Since the Treatments appear equally in the data, the proportions are balanced in the same way as if balanced=Treatment had been specified. Next are the predictive margins for Sex, followed by the marginal effect for Sex, computed as the difference in predictive margins. The marginal effect of Sex, 0.41, is significant (p=0.0095) indicating that the probability for Females is significantly larger than for Males. Note that the direction of the difference is indicated by the index values shown in the Difference column ("1-2"). If the Male-Female difference is desired, specify options=reverse.
|
The following estimates the average marginal effect of the continuous predictor, Age. The model is expanded to allow for the Age effect to vary with Treatment by including the Age*Treatment interaction. Note that the vertical bar ("|") between variables is equivalent to specifying both main effects and the interaction. The marginal effect computation uses the observed values of the predictors rather than their means since the atmeans option is omitted.
%Margins(data = Neur,
class = Treatment Sex,
response = NoPain,
roptions = event='1',
dist = binomial,
model = Treatment|Sex Treatment|Age Duration,
effect = Age,
options = cl nomodel)
The estimated margin, -0.036, indicates that increasing Age significantly decreases the probability (p=0.0011).
The marginal effect of Age can be compared between the Treatments by specifying a contrast. The following DATA step creates the appropriate data set. The data set must always include a LABEL variable containing labels for the contrasts, and a variable F, which contains the contrasts coefficients. Both variables must be character variables. The rows of the specified contrast request the three pairwise differences among the marginal effects for Age in the three Treatments. Estimates of the marginal effect of Age at each Treatment are requested with at=Treatment. The noprintbyat option arranges the three marginal effects in a single table instead of separate tables labeled by the Treatment.
data C;
length label f $32767;
infile datalines delimiter='|';
input label f;
datalines;
Treatment | 1 -1 0, 1 0 -1, 0 1 -1
;
%Margins(data = Neur,
class = Treatment Sex,
response = NoPain,
roptions = event='1',
dist = binomial,
model = Treatment|Sex Treatment|Age Duration,
effect = Age,
at = Treatment,
contrasts= C,
options = cl nomodel noprintbyat)
The three pairwise comparisons indicate that Age significantly decreases the probability of neuralgia in Treatments A and B, but not P. Further, the effect of Age differs significantly only between Treatments A and P (p=0.0429).
However, a note in the log indicates that the joint test for the contrast rows could not be provided resulting in missing statistic and p-value in the first row of the table. Because Treatment has three levels and therefore only two degrees of freedom, only two comparisons are independent. Consequently, in order to conduct the joint test, the contrast should contain only two of the three pairwise comparisons. By keeping only the first two of the three observations in data set C, the following table is produced, which provides the joint test which shows no overall difference.
The same pairwise comparison results can be obtained without the need for contrasts= by specifying margins=Treatment and diff=all.
%Margins(data = Neur,
class = Treatment Sex,
response = NoPain,
roptions = event='1',
dist = binomial,
model = Treatment|Sex Treatment|Age Duration,
effect = Age,
margins = Treatment,
diff = all,
options = cl nomodel)
With this approach, the Treatment predictive margins and their differences are also given.
- EXAMPLE 3: Marginal effects in a Poisson model
- This example uses data presented and analyzed by McCullagh and Nelder (1989). The data contain counts of the number of damage incidents (Y) occurring to individual ships over their total months of service (MONTHS). A Poisson model is used to model the incidence rate (Y/MONTHS). Predictors are the type of ship (TYPE), period of operation (PERIOD), and year of construction (YEAR). In order to model the incidence rate, the log of the months of service (LOGMONTHS) is used as an offset in the model.
The following macro call models the incidence rate and estimates and tests the marginal effect of the continuous YEAR variable on the rate. The first levels of the categorical TYPE and PERIOD predictors are specified as reference levels in the model with classgref=first. LOGMONTHS is specified as the offset. effect=year and options=rate cl together request estimation of the rate marginal effect of YEAR along with a confidence interval. If rate is not specified, the macro estimates the marginal effect of YEAR on the mean incidence count rather than the incidence rate.
%Margins(data = ship,
class = type period,
classgref= first,
response = y,
offset = logmonths,
model = type year period,
dist = poisson,
effect = year,
options = rate cl)
The estimated marginal effect of YEAR on the incidence rate is 0.00068 and differs significantly from zero (p=0.0024).
- EXAMPLE 4: Margins and marginal effects in a GEE model
- This example uses the data in the Generalized Estimating Equations (GEE) example in the Getting Started section of the GENMOD documentation (SAS Note 22930). This call of the Margins macro estimates a logistic GEE model for the probability of Wheezing. Then it estimates and tests the predictive margins for the City levels and the average marginal effect of Smoke in each City. An estimate of the marginal effect of City, computed as the difference in its predictive margins, is provided by diff=all. The difference in the Smoke marginal effects is also provided.
%Margins(data = Six,
class = Case City,
response = Wheeze, roptions = event='1',
model = City Age Smoke,
dist = binomial,
geesubject = Case, geecorr = exch,
margins = City,
effect = Smoke,
diff = all,
options = cl)
The results show that the estimated probabilities of wheezing in the two cities are both near 0.3 and the difference, the marginal effect of City, is not significant (p=0.8593). The average marginal effect of Smoke is also shown to not differ significantly (p=0.8721) between the cities and, in fact, the Smoke effect on the probability of wheezing is not significant in either city.
- EXAMPLE 5: Marginal effect in a log-linked gamma model
- The following example appears in the NLMeans macro documentation (SAS NOte 62362) where that macro is used to estimate the difference in mean failure times for two manufacturers. This can also be considered the marginal effect of the binary manufacturer variable, MFG, and can be estimated using the Margins macro.
The following macro call estimates the marginal effect as the difference in predictive margins for MFG. The model specified by class=, response=, dist=, and model= is a log-linked gamma model. The predictive margins for the manufacturers and their difference is requested by margins= and diff=all.
%Margins(data = lifdat,
class = mfg,
response = lifetime,
dist = gamma,
model = mfg,
margins = mfg,
diff = all,
options = cl)
These results duplicate those from the LSMEANS statement and the NLMeans macro in the NLMeans macro documentation. Each manufacturer's mean failure time is significantly different from zero (p<0.0001). The estimated marginal effect for MFG is 9.23 and is not significant (p=0.8986).
- EXAMPLE 6: Relative risk
- The Margins macro can estimate differences or other linear combinations of predictive margins or marginal effects by specifying diff= or contrasts=. To estimate other functions, the NLEST macro (SAS Note 58775) can be used. The NLEST macro can estimate and test linear and nonlinear combinations of model parameters given estimates and their covariance matrix.
To do this with margins or marginal effects, specify options=covout to create a separate data set containing the covariance matrix of the margins (_CovMarg) or marginal effects (_CovMeff). Estimates of the margins or marginal effects are automatically saved in data set _Margins or _MEffect. Specify the estimates in inest= and their covariance matrix in incovb= in the NLEST macro. Then specify the desired function of those estimates in f=. In this function, the estimates are referred to using the names b_p1, b_p2, b_p3, and so on in the order presented by the macro. The result can optionally be labelled by specifying label=. Other options are available and are described in the NLEST macro documentation (SAS Note 58775).
The following example estimates the relative risk of the predictive margins of Sex levels estimated using a logistic model on the binary NoPain variable. The predictive margins are estimated event probabilities. The relative risk is a ratio of these probability estimates. While the Margins macro can estimate the margins and their difference, as done below with diff=, it cannot directly estimate the ratio of margins. The following Margins macro call creates the _Margins data set containing the estimates of the margins, and options=covout creates the _CovMarg data set containing their covariance matrix. These are then specified in the NLEST macro. The function in f= specifies the ratio of the Sex='M' margin, referred to as b_p2 since it is the second estimate, to the Sex='F' margin, referred to as b_p1. To test that this ratio is equal to 1 rather than equal to 0 as is the default, null=1 is also specified. A label is specified in label=.
%Margins(data = Neur,
class = Treatment Sex,
response = NoPain,
roptions = event='1',
dist = binomial,
model = Treatment Sex Treatment*Sex Age Duration,
margins = Sex,
diff = all,
options = covout cl nomodel)
%nlest(inest=_Margins, incovb=_CovMarg, f=b_p2/b_p1, null=1, label=Rel. Risk)
The estimated Sex margins and difference are provided by the Margins macro. The NLEST macro confirms the names assigned to the margins followed by the estimated relative risk (0.6540), which differs significantly from 1 (p=0.0022). A 95% confidence interval is also provided.
- EXAMPLE 7: BY group processing using RunBy
- While the Margins macro does not support BY processing directly, the general purpose RunBY macro (SAS Note 66249) can be used to run the macro on BY groups in the data. The following uses the Neur data set in the examples above. In the statements below, a DATA step is used to subset the Neur data set to each BY group in turn. This is done with a WHERE statement that specifies the special macro variables, _BYx and _LVLx, which are used by the RunBY macro to process each BY group. The BYlabel macro variable is also used to label the displayed results with the BY group definition. Since the Margins macro writes its own titles, a FOOTNOTE statement is used instead of a TITLE statement to provide the label.
%macro code();
data subset; set Neur; where &_BY1=&_LVL1; run;
footnote "Above for &BYlabel";
%margins(data=subset, class=treatment, response=NoPain, roptions=event='1',
dist=binomial, model=treatment age duration, margins=treatment, options=nomodel)
footnote;
%mend;
%RunBY(data=Neur, by=sex)
- EXAMPLE 8: Multiple margins= or effect= variables using RunBy
- The Margins macro estimates margins for the levels of one variable or the combinations of levels of multiple variables. It does not estimate margins for the levels of each variable separately if multiple variables are specified in margins=. Similarly, only one variable can be specified in effect=. To estimate margins or marginal effects separately for multiple variables, you can use the general purpose RunBY macro (SAS Note 66249) to run the Margins macro repeatedly for each variable in a list.
To illustrate, the statements below run the Margins macro for each of two variables, Job and Reason, to estimate predictive margins separately for each. The DATA step creates data set MargVars with a variable named MargVar containing these two variable names. The appropriate Margins macro call is placed in the special macro, CODE, which the RunBY macro runs once for each level of the by= variable in the data= data set. By specifying data=MargVars and by=MargVar, RunBY runs the code in the CODE macro for each of the two variables. In each run, each variable name in turn is stored in the special macro variable _LVL1, so this macro variable is specified in margins=. To avoid repeatedly displaying the fitted model, options=nomodel is also specified. Since the variable name specified in margins= does not need to be quoted, lvlquote=no is specified in RunBY. To run Margins on the variables in the order specified in the MargVars data set, order=data is also included.
data MargVars;
length MargVar $20;
input MargVar $ @@;
datalines;
job reason
;
%macro code();
%margins(data=sampsio.hmeq, response=bad, roptions=event='1', dist=binomial,
class=job reason,
model=job reason Delinq Derog, margins=&_LVL1,
options=cl nomodel)
%mend;
%RunBY(data=MargVars, by=MargVar, lvlquote=no, order=data)
If you want to create a data set containing all of the margins from both variables rather than have them displayed, the following variation can be used. All displayed results from the Margins macro are suppressed by options=noprint. Since the _Margins data set created by the Margins macro does not contain the name of the margins= variable, that is added by the DATA step prior to the PROC APPEND step, which accumulates the _Margins data sets into data set AllMargins. To avoid problems caused by varying numbers of levels and differing names of the variable containing the level names, the DROP= and RENAME= options are used in the APPEND step.
%macro code();
%margins(data=sampsio.hmeq, response=bad, roptions=event='1', dist=binomial,
class=job reason,
model=job reason Delinq Derog, margins=&_LVL1,
options=cl noprint)
data _Margins; set _Margins; length MargVar $20; MargVar="&_LVL1"; run;
proc append base=AllMargins data=_Margins(drop=cov: rename=(&_LVL1=Level)); run;
%mend;
%RunBY(data=MargVars, by=MargVar, lvlquote=no, order=data)
proc print label; id margvar level;
var estimate stderrpm lower upper chisq pr;
title "Predictive Margins";
run;
The following is the AllMargins data set displayed by PROC PRINT.
The same approaches as above can be used for multiple marginal effects variables. In this case, the DROP= and RENAME= options in the APPEND step are not necessary.