SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 52362: Effect (model) selection and grouping of variables to enter or leave the model together during selection

DetailsAboutRate It

Effect selection methods are available in many modeling procedures, typically by specifying a SELECTION= option or SELECTION statement. An effect is a general name that applies to a variable or to a model term such as an interaction or nested term. Effect selection methods perform automated selection from a list of candidate model effects that you specify. Methods include forward, backward, stepwise, LASSO (with adaptive and group lasso variants), and LAR.  Various selection methods are available in several procedures in SAS/STAT® (such as in REG, GLMSELECT, LOGISTIC, PHREG, and others), SAS/ETS® (COUNTREG), and SAS® Viya® (such as in REGSELECT, GENSELECT, CNTSELECT and others). For details, see the documentation for these procedures.

Groups of variables representing categorical variables

One common need is to restrict the effect selection method so that a group of variables enters or leaves the model together. The most common of these situations is when you have categorical variables that are to be represented in the model by sets of coded design variables, often called indicator or dummy variables. With most procedures, this is easily accommodated by specifying any categorical variables in the CLASS statement of the procedure. The use of the CLASS statement is further discussed in this note. The CLASS statement creates the design variables for you and automatically treats all design variables for an effect as a group that enters or leaves the model together.

For example, in the following statements A and B are categorical variables and X is continuous. The CLASS statement produces a set of design variables for A and another set for B. Since at least one CLASS variable is involved, the A*B and A*X effects are also represented by sets of variables. All design variables for each of these effects enter or leave the model as a group. For instance, if the effect selection method determines that A should be added to the model, all of its design variables are added. At a later step, if the method determines that effect A*B should be removed, all of its design variables are removed.

      proc hpgenselect;
        class a b;
        model y = x a b a*b a*x;
        selection method=stepwise;
        run;

Arbitrary groups of variables

If you have a group of variables that are logically connected, you might want them to enter or leave the model together. For instance, in a health study you might want diastolic and systolic blood pressure variables to be a group, and running and resting pulse rates to be another group. A group can consist of any combination of continuous or categorical variables.

Such arbitrary groupings of effects can be defined in procedures in which the EFFECT statement is available.NOTE Many modeling procedures (such as GLMSELECT, LOGISTIC, PHREG, and others) offer both effect selection methods and the EFFECT statement. There are additional procedures that support the EFFECT statement but which do not provide effect selection.

You define a group using the collection effect type in the EFFECT statement. Then specify the name of the group in the MODEL statement instead of the individual variable names. For example, the EFFECT statement below defines a collection as the grouping of categorical variable A and continuous variables X1 and X2. You can use additional EFFECT statements to define more groups as needed. During the stepwise effect selection procedure, variables A, X1, and X2 will enter or leave the model together.

      proc logistic;
        class a;
        effect group1=collection(a x1 x2);
        model y = group1 x3 b / selection=stepwise;
        run;

__________

NOTE: In PROC REG, you can create groups of variables using braces ( { } ) in the MODEL statement. For example, the following MODEL statement creates two groups. One group consists of variables X1 and X2. The other group consists of variables X3, X4, and X5. During the stepwise variable selection procedure, X1 and X2 will enter or leave the model together. Similarly, variables X3, X4, and X5 will enter or leave as a group.

      model y = {x1 x2} {x3 x4 x5} / selection=stepwise; 


Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemSAS/STATz/OS
Z64
OpenVMS VAX
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
Microsoft Windows 8 Enterprise 32-bit
Microsoft Windows 8 Enterprise x64
Microsoft Windows 8 Pro 32-bit
Microsoft Windows 8 Pro x64
Microsoft Windows 8.1 Enterprise 32-bit
Microsoft Windows 8.1 Enterprise x64
Microsoft Windows 8.1 Pro
Microsoft Windows 8.1 Pro 32-bit
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows 2000 Server
Microsoft Windows 2000 Professional
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 R2
Microsoft Windows Server 2008 for x64
Microsoft Windows Server 2012 Datacenter
Microsoft Windows Server 2012 R2 Datacenter
Microsoft Windows Server 2012 R2 Std
Microsoft Windows Server 2012 Std
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
ABI+ for Intel Architecture
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX
SAS SystemSAS/ETSz/OS
OpenVMS VAX
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
Microsoft Windows 8 Enterprise 32-bit
Microsoft Windows 8 Enterprise x64
Microsoft Windows 8 Pro 32-bit
Microsoft Windows 8 Pro x64
Microsoft Windows 8.1 Enterprise 32-bit
Microsoft Windows 8.1 Enterprise x64
Microsoft Windows 8.1 Pro
Microsoft Windows 8.1 Pro 32-bit
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 R2
Microsoft Windows Server 2008 for x64
Microsoft Windows Server 2012 Datacenter
Microsoft Windows Server 2012 R2 Datacenter
Microsoft Windows Server 2012 R2 Std
Microsoft Windows Server 2012 Std
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows 2000 Server
Microsoft Windows 2000 Professional
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
ABI+ for Intel Architecture
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.