Many modeling procedures provide options in their CLASS statements (or in other statements) which allow you to specify reference levels for categorical predictor variables. See the first section below that shows how you can specify the reference level in a procedure offering the REF= option in its CLASS statement. Note that the REF= option for setting reference levels was added to the GLM, MIXED, GLIMMIX, and ORTHOREG beginning in SAS 9.3 TS1M2. Also in that release, the REF= option was made available for use with the GLM parameterization in procedures where it had only been available with other parameterizations. In releases prior to SAS 9.3 TS1M2, and in later releases of some procedures such as PROBIT, LIFEREG, and GAM, the REF= option in the CLASS statement is not available. These procedures always use the last level (after the levels are sorted) of a CLASS variable as the reference level. You can use either of the last two approaches below to make the last level your desired reference level.
Some procedures offer several ways to parameterize (code) the multiple design variables that the CLASS statement creates to represent a categorical predictor in the model. All parameterizations produce equivalent models but impose different interpretations on the model parameters. See the section "Parameterization of Model Effects" in the Shared Concepts and Topics chapter of the SAS/STAT Users Guide. This note lists the procedures offering multiple parameterizations and shows how a parameterization can be selected.
To set the reference level of a response variable that is categorical (such as in a logistic regression model), see this note.
Suppose Gender, with levels "M" and "F", is a predictor in your model and you want "F" to be the reference level. In a procedure such as GLIMMIXNote which provides the REF= option in the CLASS statement, you can explicitly set the reference level for this and any other CLASS predictor. In the CLASS statement below, the REF="F" option specifies that Gender="F" is to be the reference level. If you have additional variables in the CLASS statement, you can specify the REF= option in parentheses following each variable to set its reference level. For instance, suppose you have an additional numeric variable, Trt with values 0 and 1, for which you want Trt=0 to be the reference level. Note that quotes are used around REF= values whether the value is numeric or character, formatted or unformatted.
proc glimmix data=Heights; class Gender(ref="F") Trt(ref="0"); model Response(event="0") = Gender Height Trt / dist=binary link=probit solution ddfm=none; run;
If formats are used, specify the formatted value of the reference level in the REF= option. For example:
proc format; value $genfmt 'F' = 'Female' 'M' = 'Male'; run; proc glimmix data=Heights; format Gender $genfmt.; class Gender (ref="Female"); model Response(event="0") = Gender Height / dist=binary link=probit solution ddfm=none; run;
If the error message Invalid reference value appears in the log, see this note for common causes. The most common cause is specifying the unformatted value when a format is associated with the variable.
Consider a CLASS variable, X, with values 0 and 1. By default, these values are arranged in ascending alphanumeric order which results in 1 being the last level, and therefore the reference level. However, if the data are arranged so the value 1 appears before the value 0 as you read down the data set, and if you specify the ORDER=DATA option in the PROC statement, then the levels of X will stay in the order encountered in the data set. Then 0 is the last level found and it becomes the reference level. One way to get the values of X in this order is to sort your data set by X using the DESCENDING option.
For example, in the following data set, the Gender variable has levels F and M. Since F occurs before M in ascending alphanumeric sorting, M will be the reference level by default.
data Heights; input Response Gender$ Height @@; datalines; 1 F 67 0 F 66 1 F 64 1 M 71 1 M 72 0 F 63 1 F 63 0 F 67 1 M 69 0 M 68 1 M 70 1 F 63 0 M 64 1 F 67 1 F 66 0 M 67 0 M 67 0 M 69 ; proc probit data=Heights; class Gender; model Response = Gender Height; run;
The "Class Level Information" table shows that M is the last level of Gender.
|
In the "Analysis of Maximum Likelihood Parameter Estimates" table, M is the reference level since it is the last level shown and has its parameter estimate and degrees of freedom set to zero.
|
However, if you sort the data by descending Gender, then M will precede F in the sorted data set (New). By specifying the ORDER=DATA option, this ordering is preserved and F becomes the reference level.
proc sort data=Heights out=New; by Response descending Gender; run; proc probit data=New order=data; class Gender; model Response = Gender Height; run;
Now, F is the last level in the "Class Level Information" table, and the "Analysis of Maximum Likelihood Parameter Estimates" table shows that F is the reference level.
|
An alternative to reordering or sorting the data is to assign formatted values to the levels such that the last formatted value in ascending alphanumeric order is the desired reference level. Formatted values are used when you specify the ORDER=FORMATTED option in the PROC statement, though this is usually the default when a format exists for the variable.
In the following example, the Group variable indicates use of one of two types of pain reliever. It is desired to have Group=1 be the reference level. By default, Group=2 would be the reference level since it is the last sorted value.
data Headache; input Minutes Group Censor @@; datalines; 11 1 0 12 1 0 19 1 0 19 1 0 19 1 0 19 1 0 21 1 0 20 1 0 21 1 0 21 1 0 20 1 0 21 1 0 20 1 0 21 1 0 25 1 0 27 1 0 30 1 0 21 1 1 24 1 1 14 2 0 16 2 0 16 2 0 21 2 0 21 2 0 23 2 0 23 2 0 23 2 0 23 2 0 25 2 1 23 2 0 24 2 0 24 2 0 26 2 1 32 2 1 30 2 1 30 2 0 32 2 1 20 2 1 ;
By assigning the following formats to the levels, Group=1 has the last formatted value ('Old') after sorting, so it becomes the reference level when the ORDER=FORMATTED option is in effect.
proc format; value grpfmt 1 = 'Old' 2 = 'Improved'; run; proc lifereg data=Headache order=formatted; format Group grpfmt.; class Group; model Minutes*Censor(1)=Group; run;
|
__________
Note: The REF= option for setting reference levels was added to the GLM, MIXED, GLIMMIX, and ORTHOREG beginning in SAS 9.3 TS1M2. Also in that release, the REF= option was made available for use with the GLM parameterization in procedures where it had only been available with other parameterizations.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows XP Professional | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Analysis of Variance Analytics ==> Categorical Data Analysis Analytics ==> Mixed Models Analytics SAS Reference ==> Procedures ==> ANOVA SAS Reference ==> Procedures ==> CATMOD SAS Reference ==> Procedures ==> GAM SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> GLIMMIX SAS Reference ==> Procedures ==> GLM SAS Reference ==> Procedures ==> GLMMOD SAS Reference ==> Procedures ==> GLMPOWER SAS Reference ==> Procedures ==> GLMSELECT SAS Reference ==> Procedures ==> HPMIXED SAS Reference ==> Procedures ==> LIFEREG SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> MIXED SAS Reference ==> Procedures ==> PHREG SAS Reference ==> Procedures ==> PLS SAS Reference ==> Procedures ==> PROBIT SAS Reference ==> Procedures ==> QUANTREG SAS Reference ==> Procedures ==> ROBUSTREG SAS Reference ==> Procedures ==> NESTED SAS Reference ==> Procedures ==> SURVEYLOGISTIC SAS Reference ==> Procedures ==> SURVEYREG SAS Reference ==> Procedures ==> TRANSREG SAS Reference ==> Procedures ==> FMM SAS Reference ==> Procedures ==> ORTHOREG SAS Reference ==> Procedures ==> QUANTLIFE SAS Reference ==> Procedures ==> QUANTSELECT SAS Reference ==> Procedures ==> GAMPL SAS Reference ==> Procedures ==> HPFMM SAS Reference ==> Procedures ==> HPGENSELECT SAS Reference ==> Procedures ==> HPLOGISTIC SAS Reference ==> Procedures ==> HPPLS SAS Reference ==> Procedures ==> HPQUANTSELECT SAS Reference ==> Procedures ==> HPREG SAS Reference ==> Procedures ==> ICPHREG SAS Reference ==> Procedures ==> SURVEYPHREG |
Date Modified: | 2019-07-12 08:46:28 |
Date Created: | 2009-09-07 11:11:11 |