Suppose each subject in a sample of many subjects is categorized into one level of a multinomial variable and you want to test one or more hypotheses concerning the probabilities associated with the levels. You also want to get confidence intervals for probabilities of each of the levels.
This can be done in PROC CATMOD using the RESPONSE and CONTRAST statements. Also, 1 Berry and Hurtado (1994) [POSTSCRIPT]
presented the MULTINOM SAS/IML module which gives confidence intervals for linear combinations (such as pairwise differences) among such dependent proportions. A slight modification of MULTINOM provides a test statistic and associated p-value of each linear combination. MULTINOM is particularly useful when some of the observed counts are zero. In this case, the CATMOD procedure is not useful since the model parameters are not well estimated.
Suppose you take a sample of 100 objects from a population in which objects are of three possible types, 1, 2, or 3. You observe 10 objects of type 1, 18 of type 2, and 72 of type 3. You want to test for differences among the probabilities of the three types. After defining the MULTINOM module (by submitting all of this code to SAS) the following statement calls MULTINOM and produces confidence intervals and hypothesis tests for each pairwise comparison of the types. The missing value in the last argument results in all pairwise comparisons. You can also estimate and test particular linear combinations by specifying a matrix, with one linear combination per row, in the last argument. The third argument specifies that each interval and test be conducted at the 95% confidence level.
run multinom( {10 18 72} , "I" , 0.05 , . );
By setting the second argument to "S" rather than "I", the Bonferroni adjustment is applied to the intervals and tests so that they have a simultaneous 95% confidence level:
run multinom( {10 18 72} , "S" , 0.05 , . );
|
The results show that the probability of type 3 differs significantly from both type 1 and type 2 (p<0.0001), but the probabilities of types 1 and 2 are not significantly different (p=0.3785). A confidence interval for each difference of multinomial probabilities is given. The overall type 1 error rate is controlled at 0.05.
Using PROC CATMODThe analysis can also be done with PROC CATMOD as follows. First, a data set is created containing the observed counts for each level of the multinomial variable, Y.
data a; input y count; datalines; 1 10 2 18 3 72 ;
In the CATMOD statements below, the RESPONSE statement defines two response functions being two of the pairwise differences, p1-p2 and p1-p3. Only two can be defined since the set of response functions must be independent and adding the third pairwise difference, p2-p3, would be redundant (it is the difference of the first two differences). This last difference is obtained using the CONTRAST statement. Since the model contains only an intercept for each response function (no predictors are specified), the intercepts estimate the two pairwise differences and the difference in the two intercepts is the p2-p3 difference. The ESTIMATE=PARM option provides a confidence interval for the p2-p3 difference. The CLPARM option in the MODEL statement requests confidence intervals for the other pairwise differences. The ODS OUTPUT statement writes the parameter and contrast results, particularly the p-values, to data sets.
ods output estimates=pe contrasts=c; proc catmod data=a; response 1 -1 0, 1 0 -1; weight count; model y= / clparm; contrast '2 vs 3' @1 intercept -1 @2 intercept 1 / estimate=parm; run; quit;
|
Notice that the chi-square statistics and p-values agree with those from the first MULTINOM analysis.
Bonferroni adjustment (or other types of adjustment) of the p-values can be done with PROC MULTTEST. First, the parameter and contrast data sets are combined and the raw p-values are placed in a variable named RAW_P, which PROC MULTTEST requires in order to adjust the p-values.
data pvals; set pe c; raw_p=ProbChiSq; run; proc multtest pdata=pvals bon; run;
The Multtest Procedure
|
Notice that the adjusted p-values agree with the Bonferroni adjusted p-values in the second MULTINOM analysis.
Tests and confidence intervals for the multinomial probabilities can be generated using the MULTINOM module by specifying an identity matrix as its final argument. For the example above, this call of MULTINOM provides the Bonferroni adjusted 95% confidence intervals:
run multinom( {10 18 72} , "S" , 0.05 , {1 0 0, 0 1 0, 0 0 1} );
The same analysis can be done using PROC CATMOD by modeling the individual probabilities. The RESPONSE statement below selects the first two of the three response level probabilities. You cannot specify all three because they are not independent (they sum to one).
proc catmod data=a; response 1 0 0, 0 1 0; weight count; model y= / clparm; run; quit;
|
Since only two of the probabilities are independent in a set of three, only two can be estimated at a time. The following RESPONSE statement picks out the third of the response level probabilities.
proc catmod data=a; response 0 0 1; weight count; model y= / clparm; run; quit;
|
For the case with predictors which define multiple populations, you need to specify the design matrix in PROC CATMOD. Here is an example with two populations (two levels of predictor A). The design matrix is divided into sets of rows for each population, so the first two rows are for A1 and the second two rows are for A2. The rows within a population are for each response function as defined in the RESPONSE statement. This same method extends to the use of multiple predictors.
data a; input a y count; datalines; 1 1 10 1 2 18 1 3 72 2 1 10 2 2 20 2 3 70 ; proc catmod data=a; response 1 0 0, 0 1 0; weight count; population a; model y=(1 0 0 0, 0 1 0 0, 0 0 1 0, 0 0 0 1) (1="A1,p1", 2="A1,p2", 3="A2,p1", 4="A2,p2") / clparm; run; quit;
|
As in the previous example, an additional step is needed to estimate the last probability.
proc catmod data=a; response 0 0 1; weight count; population a; model y=(1 0, 0 1) (1="A1,p3", 2="A2,p3") / clparm; run; quit;
|
1 Berry, J.J. and Hurtado, G.I. (1994), "Comparing Non-Independent Proportions,"[POSTSCRIPT]
Observations: The Technical Journal for SAS Software Users, 3(4), 21-27.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows XP Professional | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | SAS Reference ==> Procedures ==> CATMOD SAS Reference ==> Procedures ==> IML SAS Reference ==> Procedures ==> MULTTEST Analytics ==> Categorical Data Analysis Analytics ==> Matrix Programming |
Date Modified: | 2008-07-02 14:19:43 |
Date Created: | 2008-07-02 11:07:11 |