60240 - Regularization, regression penalties, LASSO, ridging, and elastic net

SUPPORT / SAMPLES & SAS NOTES

Support

Usage Note 60240: Regularization, regression penalties, LASSO, ridging, and elastic net

Regularization methods can be applied in order to shrink model parameter estimates in situations of instability. As discussed by Agresti (2013), one such situation occurs when there is a large number of covariates, of which only a small subset are strongly associated with the response, and the sample size is not large. In this case, the maximum likelihood estimates can be inflated. Parameter estimates can also be inflated by collinearity (see here and here), by separation in the case of logistic regression, among other reasons.

In order to shrink the maximum likelihood estimates of model parameters toward zero, regularization methods can be applied. These methods consist of adding a penalty term of some form to the log likelihood function used in parameter estimation. Adding an L₁ norm penalty (the sum of the absolute values of the estimates) is used in the LASSO method. Adding an L₂ norm penalty (the sum of the squared estimates) is known as ridging. The elastic net method uses the L₁ and L₂ penalties in combination.

The LASSO method is particularly useful as an efficient way to do model selection since it tends to shrink some estimates all the way to zero producing a simpler model. Ridging shrinks all estimates, but generally not to zero. The drawback of shrinkage through penalization is that the estimates become biased and their variance shrinks. As a result, tests and confidence interval are not useful. However, the overall mean square error (which is the sum of the bias and variance) may be less after penalization. Another penalization method due to Firth, has been shown to reduce bias in the estimates.

With the methods discussed here, a tuning parameter multiplies the penalty term allowing control over the amount of penalization. Increasing the value of the tuning parameter increases the amount of shrinkage in the parameters. As with other smoothing methods that make use of a tuning parameter, such as generalized additive modeling, a value for the parameter must be chosen. This is done by fitting a series of models with a range of values and selecting the value that minimizes the mean square error. Typically, separate validation data are used for this assessment, or cross validation is used if separate data are not available. Information criteria, such as AIC or SBC, can also be used. This note discusses only the use of PROC NLMIXED to fit one model with a particular tuning parameter value. Note that fitting a series of models using this approach to evaluate many tuning parameter values would incur the full cost of fitting each model by maximum likelihood estimation. The SELECTION statement in the GLMSELECT and HPGENSELECT procedures employ more efficient ways to carry out this process.

Various regression penalties are available in SAS^® procedures. See the LASSO, elastic net, ridge regression, and Firth items in this note. The LASSO (and related methods) and the elastic net as provided in PROC GLMSELECT for normal response models is illustrated and discussed in more detail in Gunes (2015).

This note illustrates how these regularization methods can be implemented using PROC NLMIXED by directly specifying the log likelihood, penalty, and tuning parameter. The following statements modify the DATA step in Gunes (2015) to add a binary variable to the simulated data that it creates. The first example below uses the normally distributed variable, y, as a response. The next example uses the binary variable, b, as the response in a logistic model. Note that the true model involves only x1, x2, and parts of the c1 predictor. The true model is 2 + 5*x2 - 17*x1*x2 + 6*(c1=2) + 5*(c1=5).

      data Simdata;
         drop i j;
         array x{5} x1-x5;
         do i=1 to 1000;
            do j=1 to 5;
               x{j} = ranuni(1); /* Continuous predictors */
            end;
            c1 = int(1.5+ranuni(1)*7); /* Classification predictors */
            c2 = 1 + mod(i,3);
            yTrue = 2 + 5*x2 - 17*x1*x2 + 6*(c1=2) + 5*(c1=5);
            y = yTrue + 2*rannor(1);
            p=logistic(yTrue);
            b=rantbl(1,1-p,p)-1;
            output Simdata;
         end;
      run;

Example 1: Model selection using LASSO

The statements that follow specify the same main effects and two-way interaction model as fit by Gunes (2015) in PROC GLMSELECT, but applies the LASSO method using PROC NLMIXED. The MU= statement specifies the model effects. The LL= statement defines the function to be maximized. It consists of the necessary part of the normal likelihood and an L1 penalty multiplied by tuning parameter L1Tune with value 0.1. Note that the names of all parameters in the model (other than the intercept) begin with "b." The sumabs(of b:) function therefore is the sum of the absolute values of the model parameters. The MAXITER= option is increased beyond its default value to allow convergence to occur. The ODS OUTPUT statement saves the parameter estimates of the final model in a data set.

      proc nlmixed data=Simdata maxiter=500;
         mu=int 
            /* main effects */
            + bx1*x1 + bx2*x2 + bx3*x3 + bx4*x4 + bx5*x5 
            + bc11*(c1=1) + bc12*(c1=2) + bc13*(c1=3) + bc14*(c1=4) + bc15*(c1=5) + bc16*(c1=6) + bc17*(c1=7)
            + bc21*(c2=1) + bc22*(c2=2)
      
            /* interaction terms among x's */
            + bx12*x1*x2 + bx13*x1*x3 + bx14*x1*x4 + bx15*x1*x5 + bx23*x2*x3 + bx24*x2*x4 + bx25*x2*x5 + bx34*x3*x4 + bx35*x3*x5 + bx45*x4*x5 
      
            /* x1*c1 interaction terms */
            + bc11x1*(c1=1)*x1 + bc12x1*(c1=2)*x1 + bc13x1*(c1=3)*x1 + bc14x1*(c1=4)*x1 + bc15x1*(c1=5)*x1 + bc16x1*(c1=6)*x1 + bc17x1*(c1=7)*x1
            + bc11x2*(c1=1)*x2 + bc12x2*(c1=2)*x2 + bc13x2*(c1=3)*x2 + bc14x2*(c1=4)*x2 + bc15x2*(c1=5)*x2 + bc16x2*(c1=6)*x2 + bc17x2*(c1=7)*x2
            + bc11x3*(c1=1)*x3 + bc12x3*(c1=2)*x3 + bc13x3*(c1=3)*x3 + bc14x3*(c1=4)*x3 + bc15x3*(c1=5)*x3 + bc16x3*(c1=6)*x3 + bc17x3*(c1=7)*x3
            + bc11x4*(c1=1)*x4 + bc12x4*(c1=2)*x4 + bc13x4*(c1=3)*x4 + bc14x4*(c1=4)*x4 + bc15x4*(c1=5)*x4 + bc16x4*(c1=6)*x4 + bc17x4*(c1=7)*x4
            + bc11x5*(c1=1)*x5 + bc12x5*(c1=2)*x5 + bc13x5*(c1=3)*x5 + bc14x5*(c1=4)*x5 + bc15x5*(c1=5)*x5 + bc16x5*(c1=6)*x5 + bc17x5*(c1=7)*x5
      
            /* x1*c2 interaction terms */
            + bc21x1*(c2=1)*x1 + bc22x1*(c2=2)*x1
            + bc21x2*(c2=1)*x2 + bc22x2*(c2=2)*x2
            + bc21x3*(c2=1)*x3 + bc22x3*(c2=2)*x3
            + bc21x4*(c2=1)*x4 + bc22x4*(c2=2)*x4
            + bc21x5*(c2=1)*x5 + bc22x5*(c2=2)*x5
      
            /* c1*c2 interaction terms */
            + bc11c21*(c1=1)*(c2=1) + bc12c21*(c1=2)*(c2=1) + bc13c21*(c1=3)*(c2=1) + bc14c21*(c1=4)*(c2=1) + bc15c21*(c1=5)*(c2=1) + bc16c21*(c1=6)*(c2=1) + bc17c21*(c1=7)*(c2=1)
            + bc11c22*(c1=1)*(c2=2) + bc12c22*(c1=2)*(c2=2) + bc13c22*(c1=3)*(c2=2) + bc14c22*(c1=4)*(c2=2) + bc15c22*(c1=5)*(c2=2) + bc16c22*(c1=6)*(c2=2) + bc17c22*(c1=7)*(c2=2)
            ;
         L1Tune=0.1;
         LL=-(y-mu)**2               /* kernel of normal log likelihood */
            - L1Tune*sumabs(of b:);  /* L1 penalty */
         model y ~ general(ll);
         ods output parameterestimates=pe;
         run;

Since the LASSO method shrinks many parameter estimates to zero, interest is in seeing those parameters that remain nonzero in value. The following statements find the nonzero subset of the estimates and list them in decreasing order of their absolute values.

      data pe;
         set pe;
         absEst=abs(Estimate);
         run;
      proc sort;
         by descending absEst;
         run;
      proc print data=pe label; 
         where absest>1e-3; 
         id parameter;
         var Estimate;
         run;

Notice that the LASSO method in PROC NLMIXED found all of the truly nonzero parameters in the final model. The only true nonzero effect included is x1 probably because it is involved in the nonzero x1*x2 interaction.

Parameter	Estimate
bx12	-9.3293
bc12	5.1899
bc15	4.4225
int	4.2060
bx1	-3.2336
bx2	0.7879

Example 2: LASSO in a logistic model

A logistic model with the same linear predictor can be fit to the binary response, b, by simply using the binomial log likelihood. A smaller tuning parameter value, 0.01, is used.

      proc nlmixed data=Simdata maxiter=500;
         xb=int 
            /* main effects */
            + bx1*x1 + bx2*x2 + bx3*x3 + bx4*x4 + bx5*x5 
            + bc11*(c1=1) + bc12*(c1=2) + bc13*(c1=3) + bc14*(c1=4) + bc15*(c1=5) + bc16*(c1=6) + bc17*(c1=7)
            + bc21*(c2=1) + bc22*(c2=2)
      
            /* interaction terms among x's */
            + bx12*x1*x2 + bx13*x1*x3 + bx14*x1*x4 + bx15*x1*x5 + bx23*x2*x3 + bx24*x2*x4 + bx25*x2*x5 + bx34*x3*x4 + bx35*x3*x5 + bx45*x4*x5 
      
            /* x1*c1 interaction terms */
            + bc11x1*(c1=1)*x1 + bc12x1*(c1=2)*x1 + bc13x1*(c1=3)*x1 + bc14x1*(c1=4)*x1 + bc15x1*(c1=5)*x1 + bc16x1*(c1=6)*x1 + bc17x1*(c1=7)*x1
            + bc11x2*(c1=1)*x2 + bc12x2*(c1=2)*x2 + bc13x2*(c1=3)*x2 + bc14x2*(c1=4)*x2 + bc15x2*(c1=5)*x2 + bc16x2*(c1=6)*x2 + bc17x2*(c1=7)*x2
            + bc11x3*(c1=1)*x3 + bc12x3*(c1=2)*x3 + bc13x3*(c1=3)*x3 + bc14x3*(c1=4)*x3 + bc15x3*(c1=5)*x3 + bc16x3*(c1=6)*x3 + bc17x3*(c1=7)*x3
            + bc11x4*(c1=1)*x4 + bc12x4*(c1=2)*x4 + bc13x4*(c1=3)*x4 + bc14x4*(c1=4)*x4 + bc15x4*(c1=5)*x4 + bc16x4*(c1=6)*x4 + bc17x4*(c1=7)*x4
            + bc11x5*(c1=1)*x5 + bc12x5*(c1=2)*x5 + bc13x5*(c1=3)*x5 + bc14x5*(c1=4)*x5 + bc15x5*(c1=5)*x5 + bc16x5*(c1=6)*x5 + bc17x5*(c1=7)*x5
      
            /* x1*c2 interaction terms */
            + bc21x1*(c2=1)*x1 + bc22x1*(c2=2)*x1
            + bc21x2*(c2=1)*x2 + bc22x2*(c2=2)*x2
            + bc21x3*(c2=1)*x3 + bc22x3*(c2=2)*x3
            + bc21x4*(c2=1)*x4 + bc22x4*(c2=2)*x4
            + bc21x5*(c2=1)*x5 + bc22x5*(c2=2)*x5
      
            /* c1*c2 interaction terms */
            + bc11c21*(c1=1)*(c2=1) + bc12c21*(c1=2)*(c2=1) + bc13c21*(c1=3)*(c2=1) + bc14c21*(c1=4)*(c2=1) + bc15c21*(c1=5)*(c2=1) + bc16c21*(c1=6)*(c2=1) + bc17c21*(c1=7)*(c2=1)
            + bc11c22*(c1=1)*(c2=2) + bc12c22*(c1=2)*(c2=2) + bc13c22*(c1=3)*(c2=2) + bc14c22*(c1=4)*(c2=2) + bc15c22*(c1=5)*(c2=2) + bc16c22*(c1=6)*(c2=2) + bc17c22*(c1=7)*(c2=2)
            ;
         L1Tune=0.01;
         mu=logistic(xb);
         LL=b*log(mu)+(1-b)*log(1-mu)  /* binomial log likelihood */
            - L1Tune*sumabs(of b:);    /* L1 penalty */
         model b ~ general(ll);
         ods output parameterestimates=pe;
         run;

These statements are used again to find the nonzero subset of the estimates and list them in decreasing order of their absolute values.

      data pe;
         set pe;
         absEst=abs(Estimate);
         run;
      proc sort;
         by descending absEst;
         run;
      proc print data=pe label; 
         where absest>1e-3; 
         id parameter;
         var Estimate;
         run;

The LASSO method produces a final model with all of the truly nonzero parameters except for x2. PROC HPGENSELECT can also be used to apply the LASSO as illustrated in the next example.

Parameter	Estimate
bx12	-4.5064
int	2.7440
bx1	-2.2653
bc12	2.2284
bc15	1.9215

Example 3: Collinearity in a logistic model

This note discusses the issue of collinearity in generalized linear models. In the example there, it is shown that the Temp predictor has very small range resulting in it being collinear with the intercept in the logistic model. This instability causes some of the model parameter estimates to be inflated. These statements fit the logistic model using the original three predictors.

      proc logistic data=remiss;
        model Remiss(event="1") = Li Temp Cell;
        run;

Notice the severely inflated Intercept and Temp parameter estimates and standard errors.

Analysis of Maximum Likelihood Estimates
Parameter	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept	1	67.6339	56.8875	1.4135	0.2345
LI	1	3.8671	1.7783	4.7290	0.0297
TEMP	1	-82.0737	61.7124	1.7687	0.1835
CELL	1	9.6521	7.7511	1.5507	0.2130

The solution, discussed in the note, is to standardize the predictors. The following statements refit the model on the standardized predictors.

      proc standard data=remiss s=1 out=std; 
        var Li Temp Cell; 
        run;
      proc logistic data=std;
        model Remiss(event="1") = Li Temp Cell;
        run;

The standardization has removed the instability and the parameters and their standard errors now appear to be well-estimated. Notice that only the Li predictor appears to be significantly associated with the response.

Analysis of Maximum Likelihood Estimates
Parameter	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept	1	-3.9917	2.2477	3.1540	0.0757
LI	1	1.8090	0.8319	4.7290	0.0297
TEMP	1	-1.2197	0.9171	1.7687	0.1835
CELL	1	1.8015	1.4467	1.5507	0.2130

The LASSO can reduce the inflation of the parameters and simultaneously find the important predictors of the response. These statements fit the logistic model and add an L1 penalty to the binomial log likelihood function.

      proc nlmixed data=remiss;
         xb=int + bLi*Li + bCell*Cell + bTemp*Temp;
         p=logistic(xb);
         L1Tune=0.02;
         LL=(remiss=1)*log(p)+(remiss=0)*log(1-p)
            - L1Tune*sumabs(of b:);
         model remiss ~ general(ll);
         ods output parameterestimates=pe;
         run;
      data pe;
         set pe;
         absEst=abs(Estimate);
         run;
      proc sort;
         by descending absEst;
         run;
      proc print data=pe label; 
         where absest>1e-3; 
         id parameter;
         var Estimate;
         run;

The results show that the Cell and Temp parameters shrank to zero, leaving Li as the only important predictor. This agrees with the results from the model fit to the standardized data.

Parameter	Estimate
int	-3.0533
bLi	2.2313

The LASSO can also be applied to the logistic model using PROC HPGENSELECT. This is done with the METHOD=LASSO option in the SELECTION statement. In the statements below, the AICC criterion is used to choose among models and to stop the LASSO process. Other criteria are available. See the HPGENSELECT documentation for details.

      proc hpgenselect data=remiss;
         model Remiss(event="1") = Li Temp Cell / dist=binary;
         selection method=lasso(choose=aicc stop=aicc);
         run;

The results are similar to those from NLMIXED above with only the Li predictor selected in the final model.

Parameter Estimates
Parameter	DF	Estimate
Intercept	1	-3.124353
LI	1	2.301934

A similar result can be obtained using the elastic net by replacing the LL= statement in the above code with these statements. An L2 penalty, which is placed on the sum of the squared parameters via the USS function, and its tuning parameter multiplier are added to the log likelihood.

         L2Tune=0.005;
         LL=(remiss=1)*log(p)+(remiss=0)*log(1-p)
            - L1Tune*sumabs(of b:)
            - L2Tune*uss(of b:);

Notice that some additional shrinkage is provided by the ridging penalty.

Parameter	Estimate
int	-2.5345
bLi	1.7529

Example 4: Ridging a subset of parameters in a Poisson model

As discussed in this note, collinearity among predictors when the response count is greater than zero causes parameter inflation similar to separation in a logistic model. These statements construct a data set in which the response variable is a count for which Poisson regression is appropriate. The true model is μ = exp(β₁x₁ + β₃x₃), where β₁=β₃=1. Another variable, x₂, is the same as x₁ when the response, y, is greater than zero.

      data p;
         call streaminit(57356);
         do i=1 to 100;
            x1=rand("uniform");
            x3=rand("uniform");
            mu=exp(x1+x3);
            y=rand("poisson",mu);
            x2=x1+(y=0)*rand("uniform");
            output;
         end;
         run;

These statements attempt to fit a Poisson model to the data but result in warnings that convergence of the maximum likelihood estimation is questionable.

      proc genmod;
         model y=x1 x2 x3 / dist=poisson itprint;
         run;

The ITPRINT option shows that some of the model parameters are highly unstable. The x1 and x2 parameters (labeled Prm2 and Prm3) continue to increase in absolute value with each iteration while the intercept and x3 parameters (labeled Prm1 and Prm4) stabilize relatively quickly.

Iteration History For Parameter Estimates
Iter	Ridge	LogLikelihood	Prm1	Prm2	Prm3	Prm4
0	0	59.7861914	0.3686912	2.8349615	-1.988579	0.9084452
1	0	66.2789287	0.2046498	4.1237811	-3.258122	0.9528515
2	0	68.0371605	0.2015666	5.8759176	-5.019757	0.9553002
3	0	69.1158066	0.2126122	8.3710573	-7.528428	0.9530929
4	0	69.9580222	0.2214939	12.56063	-11.73143	0.9546535
5	0	70.581316	0.2254217	19.388698	-18.56926	0.9601877
6	0	70.9155092	0.2254955	28.224928	-27.40898	0.9644019
7	0	71.0605919	0.2253295	38.106453	-37.29131	0.9656718
8	0	71.1175969	0.2252932	48.434891	-47.61989	0.9659039
9	0	71.1390573	0.225288	58.924791	-58.10981	0.9659361
10	0	71.1470113	0.2252873	69.467573	-68.6526	0.96594
11	0	71.1499442	0.2252873	80.026963	-79.21199	0.9659405
12	0	71.1510239	0.2252872	90.59149	-89.77651	0.9659405
13	0	71.1514212	0.2252872	101.1576	-100.3426	0.9659405
14	0	71.1515674	0.2252872	111.72419	-110.9092	0.9659405
15	0	71.1516212	0.2252872	122.29093	-121.476	0.9659405
16	0	71.151641	0.2252872	132.85772	-132.0427	0.9659405
17	0	71.1516482	0.2252872	143.42452	-142.6095	0.9659405
18	0	71.1516509	0.2252872	153.99134	-153.1764	0.9659405
19	0	71.1516519	0.2252872	164.55813	-163.7432	0.9659405
20	0	71.1516523	0.2252872	175.12493	-174.31	0.9659405
21	0	71.1516524	0.2252872	185.6917	-184.8767	0.9659405
22	0	71.1516524	0.2252872	196.2585	-195.4435	0.9659405
23	0	71.1516525	0.2252872	206.824	-206.009	0.9659405
24	0.06	71.1516525	0.2252872	206.824	-206.009	0.9659405

Analysis Of Maximum Likelihood Parameter Estimates
Parameter	DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept	1	0.2253	0.1783	-0.1241	0.5747	1.60	0.2063
x1	1	206.8240	0.2003	206.4314	207.2166	1065868	<.0001
x2	0	-206.009	0.0000	-206.009	-206.009	.	.
x3	1	0.9659	0.2077	0.5589	1.3730	21.63	<.0001
Scale	0	1.0000	0.0000	1.0000	1.0000

The following statements illustrate the use of ridging on a subset of the parameters by introducing an L2 penalty in the Poisson log likelihood. Only the x1 and x2 parameters are penalized by using a penalty that is the sum of their squared values. With this added penalty, the model converges properly.

      proc nlmixed;
         L2Tune=0.04;
         mu=exp(int + b1*x1 + b2*x2 + b3*x3);
         LL=y*log(mu)-mu                /* kernel of Poisson log likelihood */
            - L2Tune*(b1**2 + b2**2);   /* L2 penalty */
         model y~general(ll);
         run;

Notice that ridging has drastically reduced the x1 and x2 parameters (labeled b1 and b2). The final parameters are fairly close to the true model values. However, standard errors and tests are not reliable in such a penalized model.

Parameter Estimates
Parameter	Estimate	Standard Error	DF	t Value	Pr > \|t\|	95% Confidence Limits		Gradient
int	0.1049	0.1740	100	0.60	0.5482	-0.2405	0.4502	1.552E-7
b1	0.9979	0.2324	100	4.29	<.0001	0.5368	1.4589	7.959E-9
b2	-0.2942	0.2235	100	-1.32	0.1911	-0.7375	0.1492	1.073E-7
b3	1.1631	0.2094	100	5.55	<.0001	0.7475	1.5786	3.647E-8

References

Agresti, A. (2013). Categorical Data Analysis, Third Edition. New York: John Wiley & Sons Inc.

Gunes, F. (2015). "Penalized Regression Methods for Linear Models in SAS/STAT^®." Cary, NC: SAS Institute Inc.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		z/OS 64-bit
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8 Enterprise 32-bit
		Microsoft Windows 8 Enterprise x64
		Microsoft Windows 8 Pro 32-bit
		Microsoft Windows 8 Pro x64
		Microsoft Windows 8.1 Enterprise 32-bit
		Microsoft Windows 8.1 Enterprise x64
		Microsoft Windows 8.1 Pro 32-bit
		Microsoft Windows 8.1 Pro x64
		Microsoft Windows 10
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 R2
		Microsoft Windows Server 2008 for x64
		Microsoft Windows Server 2012 Datacenter
		Microsoft Windows Server 2012 R2 Datacenter
		Microsoft Windows Server 2012 R2 Std
		Microsoft Windows Server 2012 Std
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> Regression SAS Reference ==> Procedures ==> HPGENSELECT SAS Reference ==> Procedures ==> NLMIXED

Date Modified:	2018-01-22 14:15:57
Date Created:	2017-04-03 15:02:41