57798 - Estimating relative risks in a multinomial response model

Usage Note 57798: Estimating relative risks in a multinomial response model

There are two types of relative risks that might be of interest when modeling a multinomial response. You might want to compare two populations with respect to an individual response level probability (P(Y=i|X=j)/P(Y=i|X=k)), or you might want to compare response level probabilities in a given population (P(Y=i|X=j)/P(Y=k|X=j). Both situations are discussed below.

In the following examples, a generalized logit model is fit to the nominal, multinomial response. As discussed in this note, the generalized logit model can also be fit in other procedures such as GLIMMIX, HPGENSELECT, FMM, CATMOD, and SURVEYLOGISTIC. For repeated measures data with a multinomial response, use PROC SURVEYLOGISTIC with a CLUSTER statement. In LOGISTIC, and most other procedures, generalized logits are requested by the LINK=GLOGIT option in the MODEL statement.

Estimating the relative risk comparing two populations on the probability of one response level

When the response is binary, you can obtain relative risk estimates as discussed in this note. In the multinomial case, relative risk estimates are nonlinear functions of the parameters in a generalized logit model, which can be fit using PROC LOGISTIC. These nonlinear functions can be estimated using the NLMeans macro, the NLEST/NLEstimate macro, or by fitting the model and doing the estimation in PROC NLMIXED. PROC CATMOD can also be used by fitting a model to the log probabilities rather than logits.

The data below are presented in the example titled "Nominal Response Data: Generalized Logits Model" in the PROC LOGISTIC documentation comparing styles of instruction at several schools. There are three response levels (Styles) and three populations (Schools).

      data School;
         length Program $ 9;
         input School Program $ Style $ Count @@; 
         datalines; 
      1 regular   self 10  1 regular   team 17  1 regular   class 26
      1 afternoon self  5  1 afternoon team 12  1 afternoon class 50 
      2 regular   self 21  2 regular   team 17  2 regular   class 26
      2 afternoon self 16  2 afternoon team 12  2 afternoon class 36 
      3 regular   self 15  3 regular   team 15  3 regular   class 16
      3 afternoon self 12  3 afternoon team 12  3 afternoon class 20 
      ;

Using PROC LOGISTIC and the NLMeans macro

The following statements fit a generalized logit model to the multinomial response, Style. The ORDER=DATA response variable option orders the Styles as they first appear in the data so that the logits are log(p_self/p_class) and log(p_team/p_class). The LSMEANS statement provides estimates of the log odds for each School. The ILINK option adds estimates of the Style level probabilities for each School. The E option produces a table of coefficients of the linear combination of parameters that define the log odds for each School on each Style logit. The table is saved by the ODS OUTPUT statement for use later with the NLMeans macro. The STORE statement saves the fitted model for use with the NLMeans and NLEST/NLEstimate macros.

      proc logistic data=School;
        freq count;
        class school / param=glm;
        model style(order=data) = School / link=glogit;
        lsmeans School / e ilink;
        ods output coef=coeffs;
        store out=logmod;
        run;

Following are the model parameter estimates and estimated Style probabilities for each School. Note that estimates are only possible for two of the three Style levels due to the constraint that the probabilities must sum to one.

Analysis of Maximum Likelihood Estimates
Parameter		STYLE	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept		self	1	-0.2877	0.2546	1.2769	0.2585
Intercept		team	1	-0.2877	0.2546	1.2769	0.2585
SCHOOL	1	self	1	-1.3350	0.3803	12.3216	0.0004
SCHOOL	1	team	1	-0.6758	0.3353	4.0607	0.0439
SCHOOL	2	self	1	-0.2285	0.3286	0.4837	0.4867
SCHOOL	2	team	1	-0.4722	0.3397	1.9314	0.1646
SCHOOL	3	self	0	0	.	.	.
SCHOOL	3	team	0	0	.	.	.

SCHOOL Least Squares Means
STYLE	SCHOOL	Estimate	Standard Error	z Value	Pr > \|z\|	Mean	Standard Error of Mean
self	1	-1.6227	0.2825	-5.74	<.0001	0.1250	0.03019
self	2	-0.5162	0.2077	-2.48	0.0130	0.2891	0.04007
self	3	-0.2877	0.2546	-1.13	0.2585	0.3000	0.04830
team	1	-0.9634	0.2183	-4.41	<.0001	0.2417	0.03908
team	2	-0.7598	0.2250	-3.38	0.0007	0.2266	0.03700
team	3	-0.2877	0.2546	-1.13	0.2585	0.3000	0.04830

In spite of the constraint, the NLMeans macro can estimate and test the differences among the Schools on all three of the Style probabilities. To use the macro, you provide the saved model from the STORE statement, the saved table of coefficients from the LSMEANS / E statement, and the link function used in the model. By default, the NLMeans macro estimates and tests pairwise differences among the mean estimates. To request that the ratio be estimated rather than the difference, specify options=ratio. For a ratio, it is generally of interest to test the null hypothesis that the ratio equals 1 rather than 0 as is the default. To do that, specify null=1.

      %NLMeans(instore=logmod, coef=coeffs, link=glogit, options=ratio,
               null=1, title=School RRs on each Style)

The Label in each row identifies the relative risk estimate using the same order as seen in the Least Squares Means table above. The last set of three rows provide estimates for the third Style (class). For example, the label in the first row indicates that the estimate, 0.4324, is the relative risk comparing School 1 and School 2 on Style=self – that is, 0.1250/0.2891. It differs significantly from 1 (p<0.0001). If the reciprocal of this is desired, add the reverse option: options=ratio reverse. Similarly, the last estimate, 1.2109, is the relative risk comparing Schools 2 and 3 on Style=class.

School RRs on each Style

Label	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq	Alpha	Lower	Upper
1 /1 0 0 0 0 0 0 0	0.4324	0.1204	22.2131	<.0001	0.05	0.1964	0.6685
1 0 /1 0 0 0 0 0 0	0.4167	0.1209	23.2605	<.0001	0.05	0.1796	0.6537
0 1 /1 0 0 0 0 0 0	0.9635	0.2047	0.0317	0.8587	0.05	0.5623	1.3648
0 0 0 1 /1 0 0 0 0	1.0667	0.2451	0.0739	0.7857	0.05	0.5862	1.5471
0 0 0 1 0 /1 0 0 0	0.8056	0.1838	1.1189	0.2902	0.05	0.4453	1.1658
0 0 0 0 1 /1 0 0 0	0.7552	0.1732	1.9976	0.1575	0.05	0.4157	1.0947
0 0 0 0 0 0 1 /1 0	1.3075	0.1499	4.2095	0.0402	0.05	1.0137	1.6013
0 0 0 0 0 0 1 0 /1	1.5833	0.2321	6.3157	0.0120	0.05	1.1284	2.0383
0 0 0 0 0 0 0 1 /1	1.2109	0.1914	1.2146	0.2704	0.05	0.8358	1.5861

Using PROC LOGISTIC and the NLEST/NLEstimate macro

Relative risk estimates can also be produced using the NLEST/NLEstimate macro by writing the expressions for the estimates in terms of the model parameters.

The formula for the predicted probabilities of the individual response levels within a population is given in "Linear Predictor, Predicted Probability, and Confidence Limits" in the Details section of the LOGISTIC documentation. Using that formula, the ratio of individual probabilities can be written in terms of the model parameters. For example, the estimated probability of Style=self in School=1 is:

exp(Intercept_Self+School_1,Self) 1+exp(Intercept_Self+School_1,Self)+exp(Intercept_Team+School_1,Team)

A similar expression can be written for the estimated probability of Style=Self in School=3:

exp(Intercept_Self) 1+exp(Intercept_Self)+exp(Intercept_Team)

The relative risk is then the ratio of these two expressions. In order to use the NLEST macro, the expressions for the relative risks must be written using the names of the parameters. These names can be displayed by specifying shownames as the first argument. See the description of the NLEST macro for details.

      %nlest(shownames, instore=logmod)

The parameter names are displayed in the following table.

Name	Estimate
B_p1	-0.2877
B_p2	-0.2877
B_p3	-1.3350
B_p4	-0.6758
B_p5	-0.2285
B_p6	-0.4722

The expressions for each of the relative risks can then be written. When estimating several functions, it is useful to specify them in a data set with appropriate labels. This is done in the following statements and the resulting data set is specified in fdata= in the macro. As discussed above, to test that the ratios equal 1, specify null=1.

      data fd; 
         length label f $32767; 
         infile datalines delimiter='|';
         input label f; 
         datalines;
      P(self,S1)/P(self,S2)|(exp(b_p1+b_p3)/(1+(exp(b_p1+b_p3)+exp(b_p2+b_p4)))) / (exp(b_p1+b_p5)/(1+(exp(b_p1+b_p5)+exp(b_p2+b_p6))))
      P(self,S1)/P(self,S3)|(exp(b_p1+b_p3)/(1+(exp(b_p1+b_p3)+exp(b_p2+b_p4)))) / (exp(b_p1)/(1+(exp(b_p1)+exp(b_p2))))
      P(self,S2)/P(self,S3)|(exp(b_p1+b_p5)/(1+(exp(b_p1+b_p5)+exp(b_p2+b_p6)))) / (exp(b_p1)/(1+(exp(b_p1)+exp(b_p2))))
      P(team,S1)/P(team,S2)|(exp(b_p2+b_p4)/(1+(exp(b_p1+b_p3)+exp(b_p2+b_p4)))) / (exp(b_p2+b_p6)/(1+(exp(b_p1+b_p5)+exp(b_p2+b_p6))))
      P(team,S1)/P(team,S3)|(exp(b_p2+b_p4)/(1+(exp(b_p1+b_p3)+exp(b_p2+b_p4)))) / (exp(b_p2)/(1+(exp(b_p1)+exp(b_p2))))
      P(team,S2)/P(team,S3)|(exp(b_p2+b_p6)/(1+(exp(b_p1+b_p5)+exp(b_p2+b_p6)))) / (exp(b_p2)/(1+(exp(b_p1)+exp(b_p2))))
      P(class,S1)/P(class,S2)|(1/(1+(exp(b_p1+b_p3)+exp(b_p2+b_p4)))) / (1/(1+(exp(b_p1+b_p5)+exp(b_p2+b_p6))))
      P(class,S1)/P(class,S3)|(1/(1+(exp(b_p1+b_p3)+exp(b_p2+b_p4)))) / (1/(1+(exp(b_p1)+exp(b_p2))))
      P(class,S2)/P(class,S3)|(1/(1+(exp(b_p1+b_p5)+exp(b_p2+b_p6)))) / (1/(1+(exp(b_p1)+exp(b_p2))))
      ;
      %nlest(instore=logmod, fdata=fd, null=1, title=School RRs on each Style)

The macro produces the same results as from the NLMeans macro above.

Using PROC CATMOD

Another approach is to model the log of the probabilities of the first two Styles (self and team). This can be done in PROC CATMOD using weighted least squares estimation of the model. Maximum likelihood estimation is only available in CATMOD when modeling logits. In the statements below, the RESPONSE statement defines the log of the two Style probabilities. The PROB option prints a table of the observed probabilities. The PARAM=REF option uses reference parameterization, just like when the same option is used in the CLASS statement in GENMOD, LOGISTIC, and other procedures. You can then write CONTRAST statements to estimate each relative risk of interest. The labels indicate the relative risk that is estimated. The ALL_PARMS keyword in the CONTRAST statements allows you to specify a list of multipliers for all of the model parameter estimates in the order they appear in the "Analysis of Weighted Least Squares Estimates" table. The specified linear combinations estimate the log relative risks. For example, the first CONTRAST labeled P(self,S1)/P(self,S3) estimates the log relative risk of selecting Style=self in School 1 vs. School 3. The ESTIMATE=EXP option exponentiates the linear combinations resulting in relative risk estimates.

      proc catmod data=School order=data;
        weight Count;
        response log 1 0 0, 0 1 0;
        model Style = School / prob param=ref;
        contrast 'P(self,S1)/P(self,S3)' all_parms 0 0  1 0 0 0  / estimate=exp;
        contrast 'P(team,S1)/P(team,S3)' all_parms 0 0  0 1 0 0  / estimate=exp;
        contrast 'P(self,S2)/P(self,S3)' all_parms 0 0  0 0 1 0  / estimate=exp;
        contrast 'P(team,S2)/P(team,S3)' all_parms 0 0  0 0 0 1  / estimate=exp;
        contrast 'P(self,S1)/P(self,S2)' all_parms 0 0  1 0 -1 0 / estimate=exp;
        contrast 'P(team,S1)/P(team,S2)' all_parms 0 0  0 1 0 -1 / estimate=exp;
        run; quit;

From the "Analysis of Weighted Least Squares Estimates" table the fitted model is:

log(p_Self) = -1.2040 + -0.8755*I(School=1) + -0.0371*I(School=2)
log(p_Team) = -1.2040 + -0.2162*I(School=1) + -0.2808*I(School=2)

where I(condition) is the indicator function which returns 1 if the condition is true and returns 0 otherwise.

Population Profiles
Sample	School	Sample Size
1	1	120
2	2	128
3	3	90

Response Profiles
Response	Style
1	self
2	team
3	class

Response Probabilities
Sample	Response Number
Sample	1	2	3
1	0.12500	0.24167	0.63333
2	0.28906	0.22656	0.48438
3	0.30000	0.30000	0.40000

Analysis of Weighted Least Squares Estimates
Parameter		Function Number	Estimate	Standard Error	Chi- Square	Pr > ChiSq
Intercept		1	-1.2040	0.1610	55.91	<.0001
		2	-1.2040	0.1610	55.91	<.0001
School	1	1	-0.8755	0.2903	9.10	0.0026
	1	2	-0.2162	0.2282	0.90	0.3434
	2	1	-0.0371	0.2125	0.03	0.8612
	2	2	-0.2808	0.2293	1.50	0.2209

The following table of contrasts shows that the estimated relative risk for Style=self comparing School 1 vs. School 3 is 0.4167. This agrees with the observed probabilities in the "Response Probabilities" table (0.125/0.3 = 0.4167). The standard error and 95% confidence interval for the relative risks are also given. These results are essentially identical to those obtained from the NLEST macro above. The 95% confidence limits differ because of the use of different methods for forming the interval. CATMOD forms a large sample confidence interval around the log relative risk and then exponentiates the limits of that interval to produce the confidence limits for the relative risk. Notice that the interval is not symmetric about the relative risk estimate. The macros and NLMIXED use the delta method to produce a confidence interval for the relative risk which is symmetric about the estimate. See this note that discusses methods for obtaining confidence intervals for ratios.

Analysis of Contrasts of Weighted Least Squares Estimates
Contrast	Type	Row	Estimate	Standard Error	Chi-Square	Pr > ChiSq	Alpha	Confidence Limits
P(self,S1)/P(self,S3)	EXP	1	0.4167	0.1209	9.10	0.0026	0.05	0.2359	0.7360
P(team,S1)/P(team,S3)	EXP	1	0.8056	0.1838	0.90	0.3434	0.05	0.5151	1.2599
P(self,S2)/P(self,S3)	EXP	1	0.9635	0.2047	0.03	0.8612	0.05	0.6354	1.4612
P(team,S2)/P(team,S3)	EXP	1	0.7552	0.1732	1.50	0.2209	0.05	0.4818	1.1838
P(self,S1)/P(self,S2)	EXP	1	0.4324	0.1204	9.06	0.0026	0.05	0.2505	0.7464
P(team,S1)/P(team,S2)	EXP	1	1.0667	0.2451	0.08	0.7789	0.05	0.6798	1.6736

The relative risks for Style=class comparing Schools cannot be estimated with CONTRAST statements in this parameterization of the model. The easiest way to obtain these estimates is to refit the model using a different ordering of the response levels. In the above example, this can be done by removing the ORDER=DATA option. This will produce an equivalent model in which Style=class is the first level rather than Style=self.

Using PROC NLMIXED

These relative risk estimates can also be obtained in PROC NLMIXED using maximum likelihood estimation, though it is a little more difficult since the multinomial likelihood must be coded in the procedure. NLMIXED requires a numeric response, so variable Y is created from the Style variable. In the PROC NLMIXED statements that follow, a large value is specified in the df= option to provide large-sample tests and confidence intervals. The linear component of the model for the two probabilities is specified in the ep1= and ep2= assignment statements. The p3= assignment statement ensures that the probabilities properly sum to 1. The following statements define the multinomial log likelihood and specify that the response, Y, is distributed according to that log likelihood. The p= assignment statement assures that each probability is valid (between 0 and 1). Finally, the ESTIMATE statements provide estimates of each of the relative risks of interest. Notice that the relative risk estimates for Style=class are computed using the fact that p_class + p_self + p_team = 1.

    proc nlmixed data=School df=1e8;
        if Style="self" then y=1;
        else if Style="team" then y=2;
        else y=3;
        ep1=exp(int1 + b11*(School=1) + b12*(School=2));
        ep2=exp(int2 + b21*(School=1) + b22*(School=2));
        p3=1/(1+ep1+ep2);
        p1=p3*ep1;
        p2=p3*ep2;
        if      y=1 then p=p1;
        else if y=2 then p=p2;
        else             p=p3;
        p = (p>0 and p<=1)*p + (p<=0)*1e-8 + (p>1);
        loglik = Count*log(p);
        model y ~ general(loglik);

        estimate 'P(self,S1)/P(self,S2)'   (int1+b11)/(int1+b12);
        estimate 'P(self,S1)/P(self,S3)'   (int1+b11)/int1;
        estimate 'P(self,S2)/P(self,S3)'   (int1+b12)/int1;

        estimate 'P(team,S1)/P(team,S2)'   (int2+b21)/(int2+b22);
        estimate 'P(team,S1)/P(team,S3)'   (int2+b21)/int2;
        estimate 'P(team,S2)/P(team,S3)'   (int2+b22)/int2;

        estimate 'P(class,S1)/P(class,S2)' (1-(int1+b11)-(int2+b21))/(1-(int1+b12)-(int2+b22));
        estimate 'P(class,S1)/P(class,S3)' (1-(int1+b11)-(int2+b21))/(1-int1-int2);
        estimate 'P(class,S2)/P(class,S3)' (1-(int1+b12)-(int2+b22))/(1-int1-int2);

        run;

The results are the same as from the NLMeans and NLEST macros above.

Estimating the relative risk comparing two response level probabilities in one population

In the example above, suppose you want to estimate relative risks comparing the instruction styles at each school. For example, you might want to estimate the relative risk comparing the self and class styles in school 1. If you fit a generalized logit model to the data, then the logit response functions that are modeled are the log of two relative risks defined on the three response levels such as log(p_self/p_class) and log(p_team/p_class). Exponentiating the estimates of these logit response functions yields estimates of the relative risks.

The following statements again fit the generalized logit model to the data using PROC LOGISTIC. The ORDER=DATA response variable option orders the Styles as they first appear in the data so that the logits are log(p_self/p_class) and log(p_team/p_class). The LSMEANS statement provides estimates of the log odds for each School. The ILINK option adds estimates of the Style level probabilities for each School. The E option produces a table of coefficients of the linear combination of parameters that define the log odds for each School on each Style logit. The table is saved by the ODS OUTPUT statement for use later with the NLMeans macro. The STORE statement saves the fitted model for use with the NLMeans and NLEST macros. The XBETA= option in the OUTPUT statement computes the log relative risks using the model parameter estimates and saves the values in variable XB in data set LogRR. The standard errors of the log relative risks are also computed and saved in variable S. The OUT= data set is structured to have one observation per response level for each input observation. As a result, the LogRR data set has three times as many observations as the input data set, School.

      proc logistic data=School;
        freq count;
        class school / param=glm;
        model style(order=data) = school / link=glogit;
        lsmeans School / e ilink;
        ods output coef=coeffs;
        output out=LogRR xbeta=xb stdxbeta=s;
        store out=logmod;
        run;

The model parameter estimates and LSMEANS results are repeated below.

Analysis of Maximum Likelihood Estimates
Parameter		STYLE	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept		self	1	-0.2877	0.2546	1.2769	0.2585
Intercept		team	1	-0.2877	0.2546	1.2769	0.2585
SCHOOL	1	self	1	-1.3350	0.3803	12.3216	0.0004
SCHOOL	1	team	1	-0.6758	0.3353	4.0607	0.0439
SCHOOL	2	self	1	-0.2285	0.3286	0.4837	0.4867
SCHOOL	2	team	1	-0.4722	0.3397	1.9314	0.1646
SCHOOL	3	self	0	0	.	.	.
SCHOOL	3	team	0	0	.	.	.

SCHOOL Least Squares Means
STYLE	SCHOOL	Estimate	Standard Error	z Value	Pr > \|z\|	Mean	Standard Error of Mean
self	1	-1.6227	0.2825	-5.74	<.0001	0.1250	0.03019
self	2	-0.5162	0.2077	-2.48	0.0130	0.2891	0.04007
self	3	-0.2877	0.2546	-1.13	0.2585	0.3000	0.04830
team	1	-0.9634	0.2183	-4.41	<.0001	0.2417	0.03908
team	2	-0.7598	0.2250	-3.38	0.0007	0.2266	0.03700
team	3	-0.2877	0.2546	-1.13	0.2585	0.3000	0.04830

Using the NLMeans macro

The NLMeans macro requires the saved model from the STORE statement, the saved table of coefficients from the LSMEANS / E statement, and the link function used in the model. By default, the NLMeans macro estimates the pairwise differences of the School probabilities within each of Styles. Specifying options=ratio produces ratios rather than differences so that relative risks can be estimated. However, the desired relative risks are defined across, rather than within, the Styles. This can be accomplished by specifying the desired contrasts of the probabilities as rows in a data set which is then specified in the contrasts= parameter of the macro.

The contrasts= data set must contain variables named SET and K1, K2, ... , Kn. Optionally, a LABEL variable can be included to label each of the specified contrasts. The SET variable is primarily used when multiple sets of means are estimated by the SLICE statement or by multiple LSMEANS, SLICE, or ESTIMATE statements. Since there is only a single LSMEANS statement and therefore only a single set of LS-means, SET=1. For a generalized logit model, n is the number of estimates from the LSMEANS statement plus the number of estimates within a logit. There are six probability estimates provided by the LSMEANS statement above. These are the estimated probabilities for the three Schools on two of the three Styles. Probability estimates can also be obtained for the third Style, Style=class, making a total of nine probability estimates. So, in this case n=9.

In the following data set, the K variables are in the same order as shown in the LSMEANS table – K1-K3 correspond to the three Schools on Style=self, K4-K6 correspond to the three Schools on Style=team, and the additional three variables, K7-K9, correspond to the three Schools on Style=class. In the first row, the 1 value selects the School=1, Style=self probability for the relative risk numerator. The -1 value selects the School=1, Style=class for the denominator. With options=ratio, the result from this row is, as shown in the label, the relative risk comparing the team and class Styles within School 1. Each of the following rows define a relative risk comparing each pair of the Styles within each of the Schools.

      data cont;
        length label $40;
        infile datalines missover;
        input label k1-k9;
        set=1;
        datalines;
      P(self,S1)/P(class,S1)    1  0  0    0  0  0   -1  0  0
      P(team,S1)/P(class,S1)    0  0  0    1  0  0   -1  0  0
      P(self,S1)/P(team,S1)     1  0  0   -1  0  0    0  0  0
      P(self,S2)/P(class,S2)    0  1  0    0  0  0    0 -1  0
      P(team,S2)/P(class,S2)    0  0  0    0  1  0    0 -1  0
      P(self,S2)/P(team,S2)     0  1  0    0 -1  0    0  0  0
      P(self,S3)/P(class,S3)    0  0  1    0  0  0    0  0 -1
      P(team,S3)/P(class,S3)    0  0  0    0  0  1    0  0 -1
      P(self,S3)/P(team,S3)     0  0  1    0  0 -1    0  0  0
      ;
      %NLMeans(instore=logmod, coef=coeffs, link=glogit, options=ratio, 
               contrasts=cont, null=1, title=Style RRs in each School)

Following are the results from the NLMeans macro.

Style RRs in each School

Label	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq	Alpha	Lower	Upper
P(self,S1)/P(class,S1)	0.1974	0.05576	207.169	<.0001	0.05	0.08808	0.3067
P(team,S1)/P(class,S1)	0.3816	0.08329	55.134	<.0001	0.05	0.2183	0.5448
P(self,S1)/P(team,S1)	0.5173	0.1645	8.612	0.0033	0.05	0.1948	0.8397
P(self,S2)/P(class,S2)	0.5968	0.1240	10.579	0.0011	0.05	0.3538	0.8398
P(team,S2)/P(class,S2)	0.4677	0.1052	25.585	<.0001	0.05	0.2615	0.6740
P(self,S2)/P(team,S2)	1.2759	0.3164	0.760	0.3833	0.05	0.6557	1.8961
P(self,S3)/P(class,S3)	0.7500	0.1909	1.714	0.1904	0.05	0.3758	1.1242
P(team,S3)/P(class,S3)	0.7500	0.1909	1.714	0.1904	0.05	0.3758	1.1242
P(self,S3)/P(team,S3)	1.0000	0.2722	0.000	1.0000	0.05	0.4666	1.5334

Using the NLEST/NLEstimate macro

These statements create a data set with one observation for each relative risk to be estimated. The NLEST macro is then called to estimate the relative risks. See the example above and the description of the NLEST macro for details. Specify null=1 to test that each ratio equals 1.

      data fd; 
         length label f $32767; 
         infile datalines delimiter='|';
         input label f; 
         datalines;
      P(self,S1)/P(class,S1)| exp(b_p1+b_p3)
      P(team,S1)/P(class,S1)| exp(b_p2+b_p4)
      P(self,S1)/P(team,S1) | exp(b_p1+b_p3-b_p2-b_p4)
      P(self,S2)/P(class,S2)| exp(b_p1+b_p5)
      P(team,S2)/P(class,S2)| exp(b_p2+b_p6)
      P(self,S2)/P(team,S2) | exp(b_p1+b_p5-b_p2-b_p6)
      P(self,S3)/P(class,S3)| exp(b_p1)
      P(team,S3)/P(class,S3)| exp(b_p2)
      P(self,S3)/P(team,S3) | exp(b_p1-b_p2)
      ;
      %nlest(instore=logmod, fdata=fd, null=1)

The results are the same as from the NLMeans macro above.

Using the predicted log relative risks

Another approach is to form confidence intervals around the log relative risks estimates and exponentiate the limits. This is different from the way the limits are obtained by the NLMeans and NLEST macros and by PROC NLMIXED. In the following DATA step, the log relative risks are exponentiated to obtain the relative risk estimates comparing response level probabilities. Confidence limits for 95% large sample confidence intervals are also computed for each relative risk. The PRINT step displays the relative risk estimates and confidence intervals for each school. The WHERE statement is used to remove the repetition in the data set.

      data RR;
        set LogRR;
        RR = exp(xb);
        RR_LCL = exp(xb-probit(.975)*s);
        RR_UCL = exp(xb+probit(.975)*s);
        run;
      proc print data=RR;
        where program="regular" and style="self" and _level_ ne "class";
        id school _level_;
        var RR:;
        run;

The two relative risk estimates per school are labeled in the _LEVEL_ column using the Style that appears in the numerator of the relative risk. The class Style is in the denominator of each. These results essentially agree with those from the NLMeans and NLEST macro. Again, the 95% confidence limits differ somewhat for the reasons discussed in the previous section.

SCHOOL	_LEVEL_	RR	RR_LCL	RR_UCL
1	self	0.19737	0.11345	0.34338
1	team	0.38158	0.24877	0.58529
2	self	0.59677	0.39717	0.89668
2	team	0.46774	0.30096	0.72695
3	self	0.75000	0.45536	1.23528
3	team	0.75000	0.45536	1.23528

If the relative risk estimate for the third possible pair of levels, self and team, is desired, the easiest way is to refit the model using a different ordering of the response levels. In this example, this can be done by removing the ORDER=DATA response option. This will produce an equivalent model in which the second logit compares self and team.

Using PROC NLMIXED

The relative risks can also be estimated by fitting the generalized logit model in PROC NLMIXED. As in the example above, the linear model for the two logits is specified in the ep1= and ep2= assignment statements and the individual level probabilities are derived from the logits in the p1, p2, and p3 statements. The following statements define the multinomial log likelihood and specify that the numeric response, Y, is distributed according to that log likelihood. The p= assignment statement assures that each probability is valid (between 0 and 1). The ESTIMATE statements, for each school, provide estimates of the relative risks comparing each pair of response levels. Note that the relative risk comparing self and team can be obtained from the difference of the two logits since log(p_self/p_team) = log(p_self/p_class) - log(p_team/p_class).

      proc nlmixed data=School df=1e8;
        if Style="self" then y=1;
        else if Style="team" then y=2;
        else y=3;
        ep1=exp(int1 + b11*(School=1) + b12*(School=2));
        ep2=exp(int2 + b21*(School=1) + b22*(School=2));
        p3=1/(1+ep1+ep2);
        p1=p3*ep1;
        p2=p3*ep2;
        if      y=1 then p=p1;
        else if y=2 then p=p2;
        else             p=p3;
        p = (p>0 and p<=1)*p + (p<=0)*1e-8 + (p>1);
        loglik = Count*log(p);
        model y ~ general(loglik);
        
        estimate 'P(self,S1)/P(class,S1)' exp(int1+b11);
        estimate 'P(team,S1)/P(class,S1)' exp(int2+b21);
        estimate 'P(self,S1)/P(team,S1)'  exp(int1+b11-int2-b21);
        
        estimate 'P(self,S2)/P(class,S2)' exp(int1+b12);
        estimate 'P(team,S2)/P(class,S2)' exp(int2+b22);
        estimate 'P(self,S2)/P(team,S2)'  exp(int1+b12-int2-b22);
        
        estimate 'P(self,S3)/P(class,S3)' exp(int1);
        estimate 'P(team,S3)/P(class,S3)' exp(int2);
        estimate 'P(self,S3)/P(team,S3)'  exp(int1-int2);
        run;

The results are the same as those from the NLMeans and NLEST macros.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		z/OS 64-bit
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8 Enterprise 32-bit
		Microsoft Windows 8 Enterprise x64
		Microsoft Windows 8 Pro 32-bit
		Microsoft Windows 8 Pro x64
		Microsoft Windows 8.1 Enterprise 32-bit
		Microsoft Windows 8.1 Enterprise x64
		Microsoft Windows 8.1 Pro
		Microsoft Windows 8.1 Pro 32-bit
		Microsoft Windows 10
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 R2
		Microsoft Windows Server 2008 for x64
		Microsoft Windows Server 2012 Datacenter
		Microsoft Windows Server 2012 R2 Datacenter
		Microsoft Windows Server 2012 R2 Std
		Microsoft Windows Server 2012 Std
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> Categorical Data Analysis Analytics ==> Regression SAS Reference ==> Procedures ==> CATMOD SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> NLMIXED SAS Reference ==> Macro

Date Modified:	2021-04-28 13:33:20
Date Created:	2016-03-04 14:47:57

Support

Usage Note 57798: Estimating relative risks in a multinomial response model

Estimating the relative risk comparing two populations on the probability of one response level

Estimating the relative risk comparing two response level probabilities in one population

Operating System and Release Information