32609 - Tests, confidence intervals, and comparisons of multinomial variable probabilities

Usage Note 32609: Tests, confidence intervals, and comparisons of multinomial variable probabilities

Suppose each subject in a sample of many subjects is categorized into one level of a multinomial variable (either nominal or ordinal), and you want to test one or more hypotheses concerning the probabilities associated with the levels. You also want to get confidence intervals for probabilities of each of the levels.

This can be done by fitting a categorical response model, without covariates, to the variable using a suitable procedure such as a generalized logit model using PROC LOGISTIC. Including an ESTIMATE statement in the procedure can provide the probabilities of all levels variable, along with confidence intervals, except for the last level. This is because there are only k-1 independent probabilities for a multinomial variable with k levels, so only k-1 logits can be modeled. The probability and confidence interval for the last level can be obtained using the NLMeans macro (version 3.1 or later is required).

The latest version of the NLMeans macro can be downloaded from the NLMEANS macro documentation (SAS Note 62362). The NLMeans macro requires version 2.1 or later of the NLEST macro. The latest version of the NLEST macro can be downloaded from the NLEST macro documentation. After downloading each macro, use the %INCLUDE statement to make them available in your SAS^® session. Follow the instructions in the Usage section in the documentation of each macro.

The following uses the data in the example titled "Nominal Response Data: Generalized Logits Model" in the LOGISTIC documentation. These statements show the observed probabilities in the Style variable. The levels are presented in the order they were encountered in the data:

   proc freq data=School order=data; 
      table style; 
      weight count; 
      run;

The probabilities of the three Style levels, expressed as percents, appear in the Percent column in the PROC FREQ results:

These statements fit an intercepts-only generalized logit model. The ESTIMATE statement with the ILINK CL CATEGORY=JOINT options displays the probabilities, standard errors, and confidence intervals for the first two of the three Style levels. The E option in this statement as well as the ODS OUTPUT and STORE statements are needed to make available the fitted model and data set of estimating coefficients for later use in the NLMeans macro:

   proc logistic data=School;
      freq Count;
      model Style(order=data)= / link=glogit;
      estimate 'Pr(self) & Pr(team)' intercept 1/e ilink cl category=joint;
      store mod;
      ods output coef=c;
      run;

The probabilities (expressed as proportions), standard errors, and 95% confidence intervals for the Self and Team levels appear in the "Means" columns of the Estimates table:

The same information for the last level, Class, can be provided by the NLMeans macro using the saved model and estimate coefficients. The three probabilities are referred to by the names mu1, mu2, and mu3. The f= option specifies what you want to estimate - in this case, just mu3. But note that you can specify linear or nonlinear functions of several means. In order for the macro to provide estimates and tests across the levels of a multinomial response variable, options=difall must also be specified. The flabel= and title= options provide a label and title in the displayed results:

   %nlmeans(instore=mod, coef=c, link=glogit, f=mu3, flabel=Pr(class), 
            options=difall, title=Style)

The probability, standard error, and 95% confidence interval for the Class level are displayed below. Note that a test that the probability equals zero is also provided. If it is of interest to test a hypothesis that the probability equals some other value, you can specify it the null= option and include it in the NLMeans macro call:

Pairwise comparisons among the probabilities can also be provided. This is the default action of the NLMeans macro:

   %nlmeans(instore=mod, coef=c, link=glogit, options=difall, 
            title=Style probabilities)

The Label column indicates the differences of the probabilities that are estimated. For example, the label, 1 -1 0, means that the probability of the second level (Team) is subtracted from the probability of the first level (Self). If you prefer the opposite direction of the difference in each pair, add options=reverse in the NLMeans macro call:

Ordinal Variable

When a k-level response variable is ordinal, categorical modeling procedures typically model an independent set of k-1 cumulative logits defined on the cumulative probabilities Pr(Y=1), Pr(Y=1 or 2), Pr(Y=1 or 2 or 3), ... , Pr(Y<k). Note that the probabilities of the individual levels can be obtained from these as differences of adjacent cumulative probabilities and the probability of the Y=k level is obtained as 1-Pr(Y<k).

Using the example titled "Ordinal Logistic Regression" in the LOGISTIC documentation, these statements fit the cumulative logit model and produce the cumulative probabilities of the nine-level response variable:

   proc logistic data=Cheese;
      freq freq;
      model y=;
      estimate 'y1-8' intercept 1/e ilink cl category=joint;
      ods output coef=c;
      store mod;
      run;

The cumulative probabilities appear in the Mean column of the Estimates table:

The probabilities of the individual response levels can be obtained from the ordinal response model by computing the difference in adjacent pairs of cumulative probabilities. All nine can be obtained in one run of the NLMeans macro by specifying a data set of the necessary functions (differences in this case) of the model estimates. The following DATA step defines a data set with character variable F, that defines each difference of estimates, and character variable LABEL providing labels for each difference. It is also necessary to have a numeric SET variable. Since the ESTIMATE statement creates a single set of estimates, SET=1:

   data fd; 
      set=1;
      length label f $32767; 
      infile datalines delimiter=',';
      input label f; 
      datalines;
   P1, mu1
   P2, mu2-mu1
   P3, mu3-mu2
   P4, mu4-mu3
   P5, mu5-mu4
   P6, mu6-mu5
   P7, mu7-mu6
   P8, mu8-mu7
   P9, 1-mu8
   ;
   %nlmeans(instore=mod, coef=c, link=clogit, fdata=fd, 
            options=difall, title=Response Probabilities)

The probability of each of the nine response levels is shown below:

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows XP Professional
		Windows Millennium Edition (Me)
		Windows Vista
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	SAS Reference ==> Procedures ==> LOGISTIC Analytics ==> Categorical Data Analysis Analytics ==> Matrix Programming SAS Reference ==> Macro

Date Modified:	2025-08-22 13:23:40
Date Created:	2008-07-02 11:07:11

Support

Usage Note 32609: Tests, confidence intervals, and comparisons of multinomial variable probabilities

Ordinal Variable

Operating System and Release Information