SUPPORT / SAMPLES & SAS NOTES
 

Support

Sample 63038: Predictive margins and average marginal effects

DetailsResultsDownloadsHotfixAboutRate It

Predictive margins and average marginal effects

Contents: Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / See Also
PURPOSE:
The Margins macro fits the specified generalized linear or GEE model and estimates predictive margins and/or average marginal effects for variables in the model. Differences and contrasts of predictive margins and average marginal effects with confidence limits are also available. Margins and effects can be estimated at specified values of other model variables or at computed values such as means or medians.
HISTORY:
The version of the Margins macro that you are using is displayed when you specify anything as the first macro argument. For example:
    %margins(v)

The Margins macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:

   NOTE: Unable to check for newer version of Margins macro.

The computations performed by the macro are not affected by the appearance of this message. However, this check can be avoided by specifying nochk as the first macro argument. This can be useful if your machine has no connection to the internet.

Version
Update Notes
2.02 In options=, added nonobs.
2.0 Removed SAS/IML® requirement. Added diff= and classgref=. In options=, added rate, nogencheck, and covout. nochk can be specified as the first (version) parameter. Removed freq= (see weight=).
1.08 Fixed bug causing errors from within= option.
1.07 Fixed bugs: mislabeled estimates, errors, or bad margin estimate if level does not exist or is dropped due to all responses missing.
1.06 Fixed problem with truncated parameter names.
1.05 Fixed problems with formatted CLASS variables.
1.03 Add within=.
1.02 Provide estimates and tests for each row in contrasts.
1.01 Allow effect= variable to be in other options.
1.0 Initial coding
REQUIREMENTS:
SAS/STAT®
USAGE:
Follow the instructions on the Downloads tab of this sample to save the Margins macro definition. Replace the text within quotation marks in the following statement with the location of the Margins macro definition file on your system. In your SAS® program or in the SAS editor window, specify this statement to define the Margins macro and make it available for use:
   %inc "<location of your file containing the Margins macro>";

Following this statement, you can call the Margins macro.

The Margins macro both fits the model and estimates the requested predictive margins and/or marginal effects. The macro cannot be used to compute margins or marginal effects for previously fitted models unless the data are available and the model can be specified in the macro to be refitted.

To estimate predictive margins for the levels of a variable (or combinations of levels of multiple variables) in the model, specify the options needed to fit the desired model and specify the variable(s) in margins=. To estimate the marginal effect of a continuous variable in the model, specify the options needed to fit the desired model and specify the variable in effect=.

See the Results tab for examples.

The following parameters are required when using the Margins macro.

data=data-set-name
Specify the name of the input data set to model.
response=variable
Specify the name of the response variable to be modeled. The specified variable must be numeric. Events/trials syntax for aggregated data is not supported.
model=model-effects
Specify the model to be fit by PROC GENMOD. This is the list of model effects that would appear following the equal sign (=) in the MODEL statement of PROC GENMOD. Nested effects are not supported. The fitted model is saved in an item store named _Fit. This item store can be used in the RESTORE= option in PROC PLM to do further estimation or plotting using the specified model.

The following parameters are optional. Typically, margins= or effect= (or both) are specified. If margin= and effect= are both omitted, the overall margin(s) is estimated.

margins=variable(s)
Margins are estimated for all levels or combinations of levels of the specified variable(s) in the data= data set or, if specified, in the margindata= data set. Margins are not provided for each of multiple variables separately, but this can be done as shown in Example 8 in the Results tab. The specified variable(s) must be in the data=data set and must also be specified in model=. The levels or combinations of levels defining the margins can be reduced using the margindata= data set or margwhere=. If margin= and effect= are both omitted, the overall margin is estimated, optionally with variables fixed by at=, if specified, or computed only over a subset of observations if within= is specified. See Details below.
effect=variable
Average marginal effects are estimated for the specified continuous variable. Only one variable can be specified and it must not be specified in class= or offset=. Marginal effects for multiple variables can be estimated as shown in Example 8 in the Results tab. If margins= and/or at= are also specified, the average marginal effects for the variable are estimated within each combination of levels of the margins= and/or at= variables. Marginal effects for categorical variables can be obtained as differences of predictive margins. Specify the class= variable in margins= instead of effect= and request differences with diff=.
margindata=data-set-name
Specify a data set containing the margins= variables for defining the margins to be computed. If not specified, the data= data set is used. The levels or combinations of levels defining the margins can be reduced using the margwhere= option. See Details below.
margwhere=where-condition
The specified where-condition subsets the data= or margindata= data set before determining the levels or combination of levels of the margins= variables for which predictive margins will be computed. See Details below.
roptions=response-variable-options
Specify any options for the response variable as described in the MODEL statement of PROC GENMOD. For the available response variable options, see the GENMOD documentation (SAS Note 22930).
class=variable(s)
Specify a list of any categorical predictors to be placed in a CLASS statement. Individual variable options (such as REF=) are not supported.
classgref=FIRST|LAST
For all variables in class=, specifies whether the reference levels are the first or last levels in their sorted order. Default: LAST.
dist=distribution-name
Specify the response distribution. Valid distribution names are Normal, Binomial, Poisson, Negbin (for negative binomial), Gamma, Geometric, IGaussian (for inverse Gaussian), or Tweedie. Default: Normal.
link=link-function-name
Specify the link function. Valid link function names are Identity, Log, Logit, Probit, CLL, or Power(p), where p is a numeric power value. The default link function is the canonical link for the specified distribution as shown in the description of the DIST= option in the GENMOD documentation (SAS Note 22930).
offset=variable
Specify the offset variable if needed. Typically used for Poisson or negative binomial models when modeling a rate, in which case the offset variable should be the log of the rate denominator. See options=rate. This variable should not be specified in class=, model=, margins=, at=, or effect=.
modelopts=model-options
Specify any options to appear in the PROC GENMOD MODEL statement other than DIST=, LINK=, OFFSET=, or SINGULAR=.
at=variable(s)
The margins requested in margins= or marginal effects requested in effect= will be estimated at each level or combination of levels of the specified variable(s) in the data= data set or, if specified, the atdata= data set. The specified variable(s) must be in the data= data set and must also be specified in model=. The levels or combinations of levels at which margins or effects will be computed can be reduced using the atdata= data set or atwhere=. See Details below.
atdata=data-set-name
Specify a data set containing the at= variables at which the requested margins or marginal effects will be computed. If not specified, the data= data set is used. The levels or combinations of levels at which margins or effects will be computed can be reduced using the atwhere= option. See Details below.
atwhere=where-condition
The specified where-condition subsets the data= or atdata= data set before determining the levels or combination of levels of the at= variables at which predictive margins or marginal effects will be computed. The where-condition can involve variables not in the model. See also within=. See Details below.
within=where-condition
After fitting the model, margins and marginal effects are computed by averaging only over the observations meeting the specified condition. The where-condition can involve variables not in model=. If quotation marks appear in the where-condition, use single quotation marks ('), not double quotation marks ("). Unlike margins= and at=, within= does not fix any variables in the model. It also does not affect the model fit, which is always fit on the complete data= data set (minus observations with missing values - see Missing Values below). Any statistics options (options=atmeans, mean=, and others) used to fix variables are also computed on the complete data= data set. See Details below.
diff=ALL|SEQ|number
Estimate and test differences among margins and/or marginal effects. If diff=all, then all pairwise differences are computed. Sequential differences (1-2, 2-3, 3-4, ...) are requested by diff=seq. All differences with a control are computed by diff=number, where number is the index number of the margin or marginal effect considered to be the control. If at= is specified, differencing among margins is done within each unique combination of the at= variable levels. For marginal effects, differencing is done within each unique combination of the combined margins= and at= variable levels. For differencing across the combinations, use contrasts= or specify the at= variables in margins= rather than in at= and specify diff=. See options=reverse.
contrasts=data-set-name
Specify a data set containing labels and contrast coefficients defining contrasts of predictive margins and/or average marginal effects to be estimated and tested. Note that coefficients should be given for all estimates, not just those within a combination of at= variable values. The data set must contain two character variables, LABEL and F. Each observation of the data set defines one contrast, which can be a multi-row contrast. LABEL contains the labels that will identify the contrasts in the results. F contains the coefficients defining each contrast. If the contrast has multiple rows, use commas to separate the sets of coefficients in the rows. In each row there should be as many coefficients as there are margins (or marginal effects) across any at= levels using their order as presented by the macro.
geesubject=variable or model-effect
Specifies the effect that defines correlated clusters of observations in GEE models when fitting the model in PROC GENMOD. Required when fitting a GEE model. See the description of the SUBJECT= option in the REPEATED statement in the GENMOD documentation (SAS Note 22930).
geewithin=variable
Optionally specifies the order of measurements within correlated clusters of observations in GEE models when fitting the model in PROC GENMOD. See the description of the WITHIN= option in the REPEATED statement in the GENMOD documentation (SAS Note 22930).
geecorr=structure-name
Specifies the correlation structure when fitting a GEE model in PROC GENMOD. For valid structure names, see the description of the TYPE= option in the REPEATED statement in the GENMOD documentation (SAS Note 22930). Default: IND (the independence structure).
weight=variable
Specifies a weight variable used when fitting the model in PROC GENMOD. See the description of the WEIGHT statement in the GENMOD documentation (SAS Note 22930). Noninteger values are not truncated. Weights affect the estimation of the model parameters used in the computation of predictive margins and average marginal effects. Note that because weights are not frequencies, they do not affect sample size. Consequently, weights are used in the computation of weighted statistics specified in mean=, median=, q1=, and q3= for continuous predictors, but are not used in frequencies (proportions) computed for categorical (class=) variables specified in mean= or balanced=. For aggregated data with the frequencies variable, f, the results obtained specifying weight=f is the same as the results on the equivalent disaggregated, individual level data without weight= as long as no categorical variables are specified in mean= or balanced=. If categorical variables are specified in mean= and proportions equivalent to disaggregated data are desired, then use a DATA step to create the equivalent disaggregated data and use the resulting data set in data= and omit weight=.
mean=variable(s)
median=variable(s)
q1=variable(s)
q3=variable(s)
Use these statistic options to fix model variables at computed values when estimating margins or marginal effects. The specified variables should not appear in margins= or at= but must appear in model=. Only numeric variables not specified in class= should appear in median=, q1=, or q3=. Variables in mean= can be specified in class= or not. For a class= variable specified in mean=, the observed proportions are used as values of the dummy variables that represent the variable in the model. Weighted statistics are computed for specified continuous variables when weight= is specified. However, proportions computed for categorical variables specified in mean= are not affected by weights. See weight=.
balanced=variable(s)
The specified variables must also be specified in model= and class= and not in margins= or at=. For a specified variable with k levels, the values of the dummy variables representing it in the model are all fixed at 1/k when computing predictive margins. This is not modified by weight= if specified. See weight=.
alpha=value
Specify the alpha level for confidence intervals with confidence level 1-alpha. Value must be between 0 and 1. Default: 0.05.
singular=value
Specify a singularity criterion for use in PROC GENMOD. Value must be between 0 and 1. See the description of the SINGULAR= option in the MODEL statement in the GENMOD documentation (SAS Note 22930).
options=list-of-options
Specify desired options separated by spaces. Valid options are:
atmeans
Compute predictive margins at the means of all other model variables except for those specified in at=. For marginal effects, all variables other than those in margins= or at= are fixed at their means. The mean=, median=, q1=, q3=, and balanced= options are ignored. For a class= variable, its overall observed proportions are used as values for the dummy variables that represent the variable in the model. If a weight= variable is specified, weighted means are computed for continuous variables. But for a categorical (class=) variable, weights do not contribute to observed proportions. See weight=.
cl
Provide confidence intervals for predictive margins, average marginal effects, and differences.
reverse
Reverse the direction of margin and effect differences. Ignored unless diff= is specified.
rate
After fitting the model and before computing predicted values, the offset specified in offset= is ignored by setting it equal to zero in all observations. For count models, this results in estimating rate margins rather than count margins.
covout
Saves the covariance matrix of margins (and marginal effects) in separate data sets as well as in the _Margins (and _MEffect) data sets. See Output data sets below. This is useful when using the NLMeans macro, NLEST macro, or doing other processing after completion.
desc
Adds the DESCENDING option in the PROC GENMOD statement to model the higher response level in binomial models. However, it is better to explicitly specify the response level to model using the EVENT= response variable option. For example, to model the probability that the response=1, specify roptions=event="1".
nomodel
Do not display the fitted model.
nonobs
Do not display the table showing numbers of observations read and used.
noprint
Suppress all displayed results. Note that results are always saved in data sets as shown in the Notes section below.
noprintbyat
Does not display predictive margins, average marginal effects, and differences in separate tables defined by the at= variables as is done by default. Instead, all margins are displayed in one table (similarly for marginal effects and differences) and the at= variable values are included in the table.
nogencheck
If specified, failure of the model to converge (such as when MAXITER=0 is used in modelopts=) does not halt macro execution.
DETAILS:
Predictive margins are estimates of the response mean and are typically used when fixing some, but not all, predictors in the model at specified values. The marginal effect of a continuous predictor at an observation estimates the slope of the mean response curve at that observation's setting of the predictors. It is computed as the partial derivative of the mean with respect to the predictor. As such, it is the instantaneous rate of change of the response mean at that point. The average of the marginal effects over the observations (AME) is often used as a measure of the effect of the continuous predictor on the response mean. A similar measure is the marginal effect estimated at the mean of the other predictors (MEM). For small samples, the AME is considered the better measure. A measure of the effect of a categorical predictor on the response mean can similarly be obtained as the difference in predictive margins at two of its levels. This is often considered the "marginal effect" of a binary categorical predictor.

The Margins macro estimates and tests predictive margins and marginal effects (AMEs and MEMs). Estimates and tests of differences of predictive margins and marginal effects are available with diff=, which can provide all pairwise differences, sequential differences, or differences with a specified control level. Tests of contrasts of predictive margins and marginal effects are available with contrasts=.

Note that when all model predictors are fixed at specified values, the predictive margin equals the conditional predicted mean at that setting of the predictors. The MEM, with predictors fixed at their means, is an example of this. In this case the margin can be estimated in various ways such as with the PRED= option in the OUTPUT statement of the modeling procedure, or by using the appropriate coefficients in an ESTIMATE statement with the ILINK option. When the data are balanced and the margins= variable is not involved in interactions with other predictors, the predictive margin can also be obtained with the LSMEANS statement by including the ILINK option and possibly the AT and OM options. Also in this case, the marginal effect of a categorical predictor, computed as the difference in means, can be obtained using the NLMeans macro (SAS Note 62362). However, when at least one predictor is not fixed, the Margins macro is needed to compute predictive margins.

By default, a complete replicate of the data= data set is created for each combination of levels of the margins= and/or at= variables in the data= data set. Each replicate fixes the margins= and/or at= variables for all observations at one combination of levels. The predictive margin for each combination is computed as the average predicted value in that combination's replicate. Similarly, the average marginal effect is the average of the marginal effects computed for the observations in that combination's replicate. If within= is specified, the averaging is done only over the observations that meet the specified condition. The data set containing all replicates can become very large if the input data set is large, or if there is a large number of combinations, or both. Consequently, specifying a variable in margins= or at= that has a large number of levels is not recommended unless the number of levels is constrained using one or more of margindata=, margwhere=, atdata=, and atwhere=.

Note that the distinction between specifying some variables in at= as compared to adding them to the variable(s) in margin= is generally minimal, only amounting to a difference in the way the estimates are presented in the displayed results. The same is true if the margins= variables were instead added in the at= list. That is because data replicates for the same combinations of variable levels are created in these cases. However, this will not be true if some of the combinations do not actually occur in the data. In that case, using both margins= and at= can result in additional estimates that do not appear using only margins= or only at=.

You can use the margindata= and/or atdata= option to specify the levels for or at which predictive margins (and marginal effects, if requested) will be computed. This is particularly useful when one or more desired levels does not occur in the data= data set. The margwhere= and/or atwhere= option can be used to subset the data= data set (or the corresponding margindata= or atdata= data set, if specified) before determining the levels.

Note that if neither margins= nor at= are specified, then no replication of the data= data set is done and the overall predictive margin or marginal effect is computed only using the data= data set. When margins= is not specified, the predictive margins are labeled as "Overall" margins.

In addition to fixing the values of any margins= or at= variables as described above, other predictors can be fixed at computed values using the statistics options (mean=, median=, q1=, and q3=) or balanced= or options=atmeans. Variables affected by these options are fixed at the computed statistic value in all observations in all replicates. When options=atmeans is specified, all other predictors are fixed at their means. Note that only mean= and balanced= can be used with variables specified in class=.

The delta method is used to determine the standard errors for predictive margins and marginal effects. If options=cl is specified, large-sample (Wald) tests and confidence intervals are provided.

BY group processing

While the Margins macro does not directly support BY group processing, this capability can be provided by the RunBY macro, which can run the Margins macro repeatedly for each of the BY groups in your data. See the RunBY macro documentation (SAS Note 66249) for details about its use. Also see the example titled "BY group processing" on the Results tab above.

Output data sets

The following data sets containing results are available after successful completion of the macro:

If margins= is specified:

_Margins
contains the estimated margins and their covariance matrix, standard errors, tests, and confidence intervals if requested.
_CovMarg (if options=covout is specified)
contains the estimated covariance matrix of the margins.
_DiffsPM (if diff= is specified)
contains estimates and tests of differences of the predictive margins with standard errors and confidence intervals.
_ContrastsPM (if contrasts= is specified)
contains estimates and tests of the specified contrasts of predictive margins with standard errors and confidence intervals.

If effect= is specified:

_MEffect
contains the estimated average marginal effects and their covariance matrix, standard errors, tests, and confidence intervals if requested.
_CovMeff (if options=covout is specified)
contains the estimated covariance matrix of the average marginal effects.
_DiffsME (if diff= is specified)
contains estimates and tests of differences of the average marginal effects with standard errors and confidence intervals
_ContrastME (if contrasts= is specified)
contains estimates and tests of the specified contrasts of average marginal effects with standard errors and confidence intervals.
LIMITATIONS and ERRORS:
If the macro terminates with an error listing valid values of options= when the specified options are correct or when options= was not specified, then download and use version 2.02 (or later) as discussed in the Usage section above.
 
The Margins macro can be used only with a subset of the models available in PROC GENMOD since that procedure is used to fit the specified model. It cannot be used with multinomial or zero-inflated models available in GENMOD. Models for survey data, or for survival data, or models containing random effects or effects constructed by the EFFECT statement are likewise exempted.

Events/trials syntax, as used in several procedures, is not supported for the analysis of aggregated binomial data. Instead, modify the data so that each events/trials observation becomes two observations with a variable indicating the response level and a variable containing the observed count for that response level. Then specify the first variable in response= and the count variable in weight=.

MISSING VALUES:
Observations with missing values in any of the model variables are omitted from the analysis. However, observations that are missing only on the response are used. Note that predicted values can be computed for observations missing only on the response and therefore they contribute in the computation of predictive margins and marginal effects. When any of mean=, median=, q1=, q3=, balanced=, or options=atmeans is specified, these observations also contribute to the computed statistics.
SEE ALSO:
The NLEST macro (SAS Note 58775) estimates and tests linear or nonlinear combinations of model parameters and can be used to estimate predictive margins when all predictors are fixed. It can also be used following the Margins macro to estimate and test functions of the margins or marginal effects not possible in the Margins macro such as relative risks or odds ratios of margins. To do this, specify options=covout in the Margins macro and then specify the _Margins or _MEffect data set in inest= and the _CovMarg or _CovMeff data set in incovb= in the NLEST macro. See the example in the Results tab.

The NLMeans macro (SAS Note 62362) can perform multiple comparisons among the levels of a model effect on the mean scale. It can be used to estimate differences of predictive margins when all predictors are fixed.

Estimates of marginal effects at the observation level are also available in the QLIM procedure in SAS/ETS® for the models that procedure fits. Use the MARGINAL option in the OUTPUT statement of PROC QLIM. Standard errors for observation marginal effects are not available. As of SAS® 9.4M6 (TS1M6), marginal effects computed in PROC QLIM are valid only for predictors that are not involved in higher-order model effects such as interactions.




These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.