SUPPORT / SAMPLES & SAS NOTES
 

Support

Sample 62362: Estimate and test differences, ratios, contrasts, or other functions of means in generalized linear models

DetailsResultsDownloadsAboutRate It

Estimate and test differences, ratios, contrasts, or other functions of means in generalized linear models

Contents: Purpose / History / Requirements / Usage / Details / Limitations / See Also

 

NOTE: Beginning in SAS® 9.4M6 (TS1M6), a version of this macro is available in the SAS/STAT® Autocall library and does not need to be downloaded and defined before use. To access features in more recent versions of the macro (see History), download and run as described in Usage below.

PURPOSE:
Provide multiple comparisons (pairwise, sequential, or against a control) on the mean scale among one or more sets of estimates produced from a model using a nonidentity link function on the response mean. More complex contrasts of means, ratios of pairs of means, or other linear or nonlinear functions of means can also be estimated and tested.
HISTORY:
The version of the NLMEANS macro that you are using is displayed in the log when you specify anything as the first argument. For example:
    %nlmeans(v)

The NLMEANS macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message in the log:

   NOTE: Unable to check for newer version of the NLMEANS macro.

The computations performed by the macro are not affected by the appearance of this message. However, this check can be avoided by specifying nochk as the first macro argument. This can be useful if your machine has no connection to the internet.

Version
Update Notes
3.0 Added f=, fset=, flabel=, and fdata=. Fixed behavior of options=joint with multinomial models. Enhanced flexibility with contrasts= and multinomial models. Requires version 2.1 or later of the NLEST macro.
2.0 nochk can be specified as the first (version) parameter. Requires version 2.1 or later of the NLEST macro.
1.4 Added where= and covdrop=. Requires version 1.9 or later of the NLEST macro. Version 1.4 of NLMEANS and version 1.9 of NLEST are available in the SAS/STAT Autocall Library beginning in SAS 9.4M8 (TS1M8).
1.3 Added print | noprint to options=. Added null=. Requires version 1.8 or later of the NLEST macro.
1.04, 1.1 Store sets of results in data sets EST1, EST2, ... . Append all sets into a single results file using options=append. Version 1.1 of NLMEANS and version 1.6 of NLEST are available in the SAS/STAT Autocall Library beginning in SAS 9.4M6 (TS1M6).
1.03 Fix for LABEL variable in contrasts= data set.
1.02 Minor fix to version printing.
1.01 A LABEL variable can optionally be included in the contrasts= data set.
1.0 Initial coding
REQUIREMENTS:
Base SAS®, SAS/STAT®, and the NLEST (SAS Note 58775) are required. See History above.
USAGE:
Follow the instructions on the Downloads tab of this sample to save the NLMEANS macro definition. Follow similar instructions to download and define the NLEST macro (SAS Note 58775). Replace the text within quotation marks in the following statements with the locations of the NLMEANS and NLEST macro definition files on your system. In your SAS program or in the SAS editor window, specify these statements to define both macros and make them available for use in your SAS session:
   %inc "<location of your file containing the NLEST macro>";
   %inc "<location of your file containing the NLMEANS macro>";

After defining both macros in your SAS session, fit your model and include one or more LSMEANS, SLICE, ESTIMATE, or LSMESTIMATE statements with the E option and save the model using the STORE statement. You can then call the NLMEANS macro to estimate and test functions of response means. See the Results tab for examples.

The following parameters are required when using the NLMEANS macro. The necessary model information is provided to the macro by specifying either the instore= parameter or both the inest= and incovb= parameters. If the modeling procedure provides a STORE statement for saving the fitted model, instore= is generally the better method for providing the model information.

instore=item-store
Specifies the fitted model that was saved using the STORE statement in the modeling procedure. The OUT= option in the STORE statement saves the model in a file known as an item store. This is the preferred method for providing the required model information. However, if the modeling procedure does not offer the STORE statement, then you might be able to use inest= and incovb= instead. See "Limitations" below.

... or both of the following ...

inest=data-set-name
Specifies the data set of parameter estimates saved using an ODS OUTPUT statement in the modeling procedure. The parameter estimates of the model should be stored in a variable named ESTIMATE in this data set. An error is issued if the ESTIMATE variable is not found. When inest= is specified, incovb= must also be specified. The parameter vector in the inest= data set must be compatible with the covariance matrix in the incovb= data set. See "Compatibility error when using inest= and incovb=" below.
incovb=data-set-name
Specifies the data set containing the variance-covariance matrix of model parameters saved using an ODS OUTPUT statement in the modeling procedure. Typically, an option such as COVB is required in the modeling procedure to make this matrix available for saving. When inest= is specified, incovb= must also be specified. The parameter vector in the inest= data set must be compatible with the covariance matrix in the incovb= data set. See "Compatibility error when using inest= and incovb=" below.

The following parameters are also required.

coef=data-set-name
Specifies the data set of estimate coefficients saved using an ODS OUTPUT statement in the modeling procedure. The required table(s) must first be produced by including the E option in the LSMEANS, SLICE, ESTIMATE, or LSMESTIMATE statement in the modeling procedure. Note that the resulting data set contains multiple sets of estimates (displayed in multiple tables), when there are multiple variables in the LSMEANS statement, the SLICEBY= option is used in the SLICE statement, or there are multiple LSMEANS, SLICE, ESTIMATE, or LSMESTIMATE statements. The estimate sets are indexed by the LMatrix variable in this data set. The table name is typically Coef and can be saved in a data set by specifying:
ods output Coef=data-set-name;
link=link-function
Specifies the link function used in the modeling procedure, typically in the LINK= option in the MODEL statement. Note that for models fit by PROC PROBIT, which specifies the link with the DIST= option, specify link=probit, normit, or cumprobit when DIST=NORMAL; link=logit or cumlogit when DIST=LOGISTIC; or link=cll or cumcll when DIST=EXTREME or GOMPERTZ. The ALOGIT link is not supported.

Estimation and testing of functions of response means, such as pairwise differences or ratios, or other linear or nonlinear functions of means can be requested using any of the following. By default, diff=all. When contrasts= is specified, diff= is ignored. contrasts= is not available with multinomial models.

diff=type <type> ...
where type is all, seq, or number
To estimate and test all pairwise differences of means, specify diff=all. Sequential differences of means (μ1-μ2, μ2-μ3, ...) are provided when diff=seq. To obtain all differences with a control level, specify diff=number, where number is the position of the control level in the ordered list of levels shown in the results from the modeling procedure. For example, if level A is the control in a variable whose levels are shown in the order A, B, C, then specify diff=1. When there are multiple sets of estimates from the modeling procedure (presented in multiple tables) saved in the coef= data set, you can either specify a single type to apply to all estimate sets, or a list of types to apply a different type to each of the sets. If diff= is omitted, all pairwise differences are done for all sets (diff=all). If diff= and contrasts= are both specified, diff= is ignored. See the Examples in the Results tab.
contrasts=data-set-name
Specifies an optional data of coefficients defining contrasts among the estimates in each set of estimates. The specified data set must contain a variable named SET and k variables named K1, K2, ... , Kk, where k is the number of Row variables in coef= data set. Optionally, a LABEL variable can be included to label each of the specified contrasts. For a given row in the data set, the value of SET indicates the set of estimates that the contrast in that row applies to. The SET value must match one of the values of the LMatrix variable in the coef= data set. The K variables correspond to the Row variables in the coef= data set, which define each estimate. Multinomial models are fit to f response functions such as logits. For ordinal models, k=mf where m is the number of estimates in each response function. For nominal, link=glogit models, k=mf+m, where the additional set of m variables corresponds to the last response level. See Multinomial models in Details below. See the description of options=ratio for limitations on the contrast coefficients when mean ratios are desired. If contrasts= and diff= are both specified, diff= is ignored. contrasts= is not available with multinomial models when options=joint is specified. Use f= or fdata= instead. See Example 2 in the Results tab.
f=expression
Requires version 3.0 or later of the NLMEANS macro and version 2.1 or later of the NLEST macro. Specifies a linear or nonlinear function of the available response means to be estimated and tested, where the means should be referred to as mu1, mu2, ... , muk where k is the number of estimates produced by the LSMEANS, SLICE, ESTIMATE, or LSMESTIMATE statements in the model fitting step. The ordering of the mean names corresponds to the order of the estimates in the output from the modeling procedure. The expression can involve mathematical functions (such as log(·), exp(·), and so on). An example of a valid expression is f=log((mu1/mu5)/(mu2/mu6)). See Examples 2 and 3 in the Results tab.
fset=number
Requires version 3.0 or later of the NLMEANS macro and version 2.1 or later of the NLEST macro. When there are multiple estimate sets in the coef= data set, fset= is required to indicate the set to which the function of means defined by f= should be applied. number should be an integer, 1, 2, ... , s, where s is the number of sets that appear in coef=. Set numbers are shown in the LMatrix variable in the coef= data set and number should match one of those values. Note that sets are numbered in the order in which they appear in the modeling procedure output and not necessarily in the order of the submitted statements. See Examples 2 and 3 in the Results tab.
flabel=label
Requires version 3.0 or later of the NLMEANS macro and version 2.1 or later of the NLEST macro. Optionally provides a label to be displayed with the estimate and test of the function defined in f=. label must not contain quotation marks (" or '), ampersands (&), commas, or parentheses. See Examples 2 and 3 in the Results tab.
fdata=data-set-name
Requires version 3.0 or later of the NLMEANS macro and version 2.1 or later of the NLEST macro. Specifies a data set containing one or more functions of means to be estimated. fdata= is most useful when you have more than one function to estimate. This data set must contain a character variable, F, and a numeric variable, SET. Optionally, a character variable, LABEL, can be included. In each observation of the data set, F contains an expression defining a function of response means. The expression should appear in the same way as described in f=. If included, LABEL contains a text string used to label the estimated function in the results and should not be enclosed in or contain quotation marks. If omitted, the expression in F is used as a label. The SET value should indicate the estimate set that the function in F applies to. See Example 2 in the Results tab.

The following parameters are optional:

where=condition
Requires version 1.4 or later of the NLMEANS macro and version 1.9 or later of the NLEST macro. Specifies an optional condition to subset the inest=, incovb=, and coef= data sets. Condition is a valid expression for the WHERE statement and is useful when the input data sets were created using a BY statement or, in survey analysis, when the DOMAIN statement was used. See Example 4 in the Results tab.
covdrop=variable(s)
Requires version 1.4 or later of the NLMEANS macro and version 1.9 or later of the NLEST macro. Specifies one or more variables to be dropped from the incovb= data set. This can be helpful when an error occurs indicating that the covariance matrix is incompatible with the parameter vector. That error is often caused by the presence of numeric variables in the incovb= data set that do not contain columns of the covariance matrix.
null=value
Requires version 1.3 or later of the NLMEANS macro and version 1.8 or later of the NLEST macro. Specifies value in the null hypothesis H0: f(μ)=value, where f(μ) is the function of means being tested and value is a numeric value. Scientific notation, such as 1E4, is not allowed. The default is null=0. See Example 1 in the Results tab.
df=value
Specifies the degrees of freedom to be used in the tests and confidence intervals computed for the estimated functions. value must be a non-zero, positive value. Scientific notation, such as 1E4 is not allowed. If omitted, large-sample Wald statistics are given. The degrees of freedom for testing a linear combination of parameters in a linear model would typically be the number of observations used in fitting the model minus the number of parameters estimated in the model – essentially, the error degrees of freedom.
alpha=value
Specifies the alpha level to be used in computing confidence limits. If omitted, alpha=0.05.
title=title-text
Specifies a title for the table(s) of results. The title-text must not contain quotation marks (" or '), ampersands (&), commas, or parentheses. If omitted, title=Nonlinear Function Estimate. See the Examples in the Results tab.
options=<JOINT|NOJOINT> <NAMES|NONAMES> <REVERSE|NOREVERSE> <RATIO|NORATIO> <DIFINFNS|DIFALL> <APPEND|NOAPPEND> <PRINT|NOPRINT>
Use options= to enable or disable any of several binary options. If not specified, options=nojoint nonames noreverse noratio difinfns noappend print.
JOINT combines multiple sets of estimates into a single set before differencing or applying contrasts. See Example 3 in the Results tab and SAS Note 70221.
NAMES displays the names of the model parameters used by the NLEST macro.
REVERSE reverses the direction of differencing done by diff=. REVERSE is ignored when contrasts=, f=, or fdata= is specified.
RATIO requests estimation of ratios rather than differences of pairs of means. RATIO can be used with diff= or contrasts=, but with contrasts= each contrast must specify exactly one 1 coefficient to select the numerator, one -1 coefficient to select the denominator, and 0 coefficients otherwise. See Example 1 in the Results tab.
DIFALL can be used with multinomial models to apply the differencing type specified in diff= to means across all the response functions rather than within the response functions as done by the default DIFINFNS. This option is ignored for nonmultinomial models or when diff= is not used. See Example 2 in the Results tab.
APPEND merges results from all estimate sets into a single results data set named EST_ALL. NOAPPEND saves results sets in separate data sets (EST1, EST2, ...). The last results set is always saved in data set EST as well.
NOPRINT suppresses display of the results table (but does not suppress the table of parameter names if NAMES is also specified). Requires version 1.3 or later of the NLMEANS macro and version 1.8 or later of the NLEST macro.
DETAILS:
The NLMEANS macro can be used after fitting a Generalized Linear or Generalized Estimating Equations Model (GLM or GEE) and estimating multiple linear combinations of model parameters, Liβ, which are response means, μi, when the inverse of the link function, g, is applied. That is, μi = g-1(Liβ). Several SAS/STAT modeling procedures can be used including GENMOD, GLIMMIX, LOGISTIC, PROBIT, SURVEYLOGISTIC, and GEE. Liβ estimates are available using the LSMEANS or SLICE statement in the modeling procedure. The ESTIMATE and LSMESTIMATE statement can also be used, but only if they estimate one or more individual gi) and not functions of two or more gi) such as differences or other linear combinations. The ILINK option can be used in these statements to display estimates of the means, μi, in the procedure output.

While the DIFF option in the LSMEANS and SLICE statements provide pairwise differences on the link scale, Liβ-Ljβ, differences on the mean scale, μij, are not available. Similarly, in an ESTIMATE or LSMESTIMATE statement that defines a difference, Liβ-Ljβ, the ILINK option applies the inverse of the link function to the difference, g-1(L1β-L2β) rather than computing the difference of the inverse linked estimates g-1(L1β)-g-1(L2β) = μij. The same situation applies to estimating functions that are more complex than a simple difference. The NLMEANS macro is provided to make estimation of these functions available on the mean scale rather than the link scale. Note that the quantity g-1(L1β-L2β) is generally only of interest in the case where the link function, g, is the log. In this case, the ILINK or EXP option estimates the ratio of means.

Note that the NLMEANS macro is not needed for models that use the identity link. This includes models fit by the REG, GLM, MIXED, or ORTHOREG procedures and others. For these models, μi=Liβ, so differences or other functions of the Liβ are equivalent functions of the μi. Consequently, the results of the DIFF option in the LSMEANS or SLICE statement, or the results of an ESTIMATE or LSMESTIMATE statement that defines a function of the Liβ directly provide the same function of the μi.

The NLMEANS macro can be used to provide estimates and tests of differences of means, μij, ratios of means, μij, or linear contrasts of means. Using f= or fdata=, even nonlinear functions of means can be estimated. Standard errors are obtained using the delta method. To use the macro, you supply the saved model and a data set containing the coefficients, Li, used by one or more LSMEANS, SLICE, ESTIMATE, or LSMESTIMATE statements. You also indicate the link function used in the model. The model is best saved using the STORE statement in the modeling procedure. The coefficients can be saved by including the E option in any LSMEANS, SLICE, ESTIMATE or LSMESTIMATE statement(s) specified in the procedure, and by including an ODS OUTPUT statement to save the displayed table of coefficients in a data set. In most procedures, the name of the coefficients table is Coef, so the following statement saves it in a data set.

ods output coef=data-set-name;

See the list of macro parameters above for details about how to provide the saved model and coefficients to the macro, about how to request differences, ratios, contrasts, or other functions of means as well as other options.

The macro can process one or more sets of estimates. Multiple sets of estimates occur when the modeling procedure includes an LSMEANS statement with multiple variables, a SLICE statement with the SLICEBY= option, or multiple LSMEANS, SLICE, ESTIMATE, or LSMESTIMATE statements. The macro estimates the requested function(s) in each set and a table of results is displayed for each set of estimates. If you want to estimate function(s) of means defined across multiple sets, you can use options=joint to combine all of the separate sets into a single set.

Multinomial Models

For ordinal multinomial models (link=cumlogit, cumprobit, cumloglog, cumcloglog), the estimated means in any population are cumulative probabilities, Pr(Y=1), Pr(Y≤2), ... , Pr(Y≤ l), where l is the number of cumulative response functions, which is one less than the number of levels in the response variable, Y. For nominal multinomial models (link=glogit), the estimated means in any population are individual level probabilities, Pr(Y=1), Pr(Y=2), ... , Pr(Y= l ), where l is the number of response levels.

When diff= is specified in a multinomial model to provide pairwise comparisons in an estimate set (such as among levels of a variable in the LSMEANS statement), the macro by default estimates the requested comparisons within each of the l response functions or levels. This is the action with the default options=difinfns. For example, an ordinal cumulative logit model (link=cumlogit) on a three-level response has l = 2 cumulative logit response functions and predicts each of two cumulative probabilities (means). The requested comparisons are estimated separately for each cumulative mean. For a nominal multinomial model on a three-level response, by default the requested differences are estimated separately for each of the l = 3 individual response level probabilities (means). If you specify options=difall with diff=, the differencing method is applied across all k=ml probabilities, where m is the number of estimates in the estimate set. If you want to estimate differences or other functions defined across all the k means in an estimate set, you can use contrasts=, f=, or fdata=. See Example 2 in the Results tab.

When specifying a contrasts= data set, each contrast (row) of the data set can contain coefficients for all k=ml means. However, if you want the same contrasts to be applied separately to the m means within each response function or level, then you can specify just k=m coefficients and the macro will duplicate the provided coefficients in each of the l response functions or levels. Again, see Example 2 in the Results tab.

Output Data Sets

When the NLMEANS macro processes a single set of estimates (such as from a single LSMEANS statement), results are automatically saved in data set EST. When multiple sets of estimates are processed, the results from each set are saved by default in separate data sets named EST1, EST2, EST3, ... . Specify options=append to create a single data set named EST_ALL of all results from all sets. Be aware that if EST_ALL already exists, new results are appended to it. The last results set is also stored in data set EST.

BY Group or Domain Processing

The NLMEANS macro does not directly support BY group processing (such as for the analysis of multiply imputed data) or processing of domains from a survey analysis. That is, it cannot process results from a modeling procedure that was run using a BY or DOMAIN statement. However, this capability can be provided by the RunBY macro, which can run the NLMEANS macro repeatedly for each of the BY groups or domains. Version 1.4 or later of the NLMEANS macro, version 1.9 or later of the NLEST macro, and version 1.1 or later of the RunBY macro are required. See the RunBY macro documentation (SAS Note 66249) for details about its use. Additionally, you can use where= to allow NLMEANS to process the results of one BY group or domain by specifying an appropriate condition to select that BY group or domain. See the Example 4 in the Results tab above.

 

Troubleshooting

Hessian Warning

Since the LSMEANS and SLICE statements require GLM parameterized CLASS variables (PARAM=GLM in the CLASS statement), the NLEST macro (which is called by the NLMEANS macro) will typically display the following Warning message in this log. This Warning can be ignored when it is caused by the use of GLM parameterization.

WARNING: The final Hessian matrix is not positive definite, and therefore the estimated
         covariance matrix is not full rank and may be unreliable.  The variance of some
         parameter estimates is zero or some parameters are linearly related to other
         parameters.
    

Compatibility Error when Using inest= and incovb=

Specifying inest= and incovb= instead of instore= is generally not necessary. If used, modifications of those data sets or use of where= and/or covdrop= might be needed to correct incompatibility of the parameter vector and covariance matrix. In some cases, it might be necessary to use the NLEST macro directly rather than NLMEANS.

The incovb= data set should have the same number of observations (rows) and variables (columns) as the number of rows in the inest= data set in order to be compatible. Otherwise, an error message is issued that indicates the relevant numbers of rows and columns. If the incovb= data set contains numeric variables other than those containing the covariance matrix, they should be removed in order to avoid a compatibility error. This can be done either by preprocessing the data set to remove the extraneous variables or by specifying them in covdrop= (requires version 1.4 or later of the NLMEANS macro and version 1.9 or later of the NLEST macro).

Warnings Concerning AdditionalEstimates, df, and Probt

If a requested function of means results in a computational error, such as division by zero or taking the log of a negative value, the macro will issue Warning messages in the log indicating that 'AdditionalEstimates' was not created and that variables df and Probt were never referenced. No results are presented for the estimate set where this occurs even if other functions in the set are estimable.

LIMITATIONS:
The NLMEANS macro is not intended for use with survival models (LIFEREG, PHREG, SURVEYPHREG), models from PROC CATMOD, or from SAS/ETS® procedures. The macro cannot be used to compare means from zero-inflated models fit by PROC GENMOD.

Some modeling procedures cannot provide the necessary covariance matrix for some models. Some procedures either do not have a STORE statement (such as PROC FMM) or do not save the necessary model information (such as PROC COUNTREG). In such cases, use inest= and incovb= instead of instore=. When using inest= and incovb=, incompatibility of the parameter vector and covariance matrix can occur. See Compatibility error when using inest= and incovb= above.

For some models, such as those fit by the GENMOD or GLIMMIX procedures, use of the LSMEANS, SLICE, ESTIMATE, or LSMESTIMATE statements in the PLM procedure is recommended rather than using those statements in the modeling procedure. Using those statements in PLM requires saving the fitted model using the STORE statement in the modeling procedure.

Each coefficient column vector in an estimate set (identified by the LMatrix variable) appearing in the coef= data set should estimate an individual, link-transformed mean, gi). The NLMEANS macro applies the inverse of the link function specified in link= to obtain the means, μi, to be used in estimating the functions requested using the various macro parameters described above.

SEE ALSO:
The NLMEANS macro forms the expressions to be evaluated. It then calls the NLEST macro to do the computations. See the NLEST macro description (SAS Note 58775) for details. The Margins macro (SAS Note 63038) can also be used to estimate and test differences or contrasts of means. Unlike the NLMEANS macro, the Margins macro can be used when one or more predictors in the model is not held fixed.



These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.