The GLMSELECT Procedure

External Cross Validation

The method of cross validation that is discussed in the previous section judges models by their performance with respect to ordinary least squares. An alternative to ordinary least squares is to use the penalized regression that is defined by the LASSO or elastic net method. This method is called external cross validation, and you can specify external cross validation with CHOOSE=CVEX, new in SAS/STAT 13.1. CHOOSE=CVEX applies only when SELECTION=LASSO or SELECTION=ELASTICNET.

To understand how $k$-fold external cross validation works, first recall how $k$-fold cross validation works, as shown in Figure 47.18. The first column in this figure illustrates dividing the training samples into four folds, the second column illustrates the same training samples with reduced numbers of variables at a given step, and the third column illustrates applying ordinary least squares to compute the CVPRESS statistic. For the SELECTION=LASSO and SELECTION=ELASTICNET options, the CVPRESS statistic that is computed by $k$-fold cross validation uses an ordinary least squares fit, and hence it does not directly depend on the coefficients obtained by the penalized least squares regression.

Figure 47.18: Applying $k$-fold Cross Validation to Computing the CVPRESS Statistic

If you want a statistic that is directly based on the coefficients obtained by a penalized least squares regression, you can specify CHOOSE=CVEX to use $k$-fold external cross validation. External cross validation directly applies the coefficients obtained by a penalized least squares regression to computing the predicted residual sum of squares. Figure 47.19 depicts $k$-fold external cross validation.

Figure 47.19: $k$-fold External Cross Validation

In k-fold external cross validation, the data are split into k approximately equal-sized parts, as illustrated in the first column of Figure 47.19. One of these parts is held out for validation, and the model is fit on the remaining $k-1$ parts by the LASSO method or the elastic net method. This fitted model is used to compute the predicted residual sum of squares on the omitted part, and this process is repeated for each of the k parts. More specifically, for the $i$th model fit ($i=1,2, \ldots , k$), let $\bX _{-i}$ denote the held-out part of the matrix of covariates and $\mb {y}_{-i}$ denote the corresponding response, and let $\bX _ i$ denote the remaining $k-1$ parts of the matrix of covariates and $\mb {y}_{i}$ denote the corresponding response. The LASSO method is applied on $\bX _ i$ and $\mb {y}_ i$ to solve the optimization problem

\[  \mbox{minimize} ||\mb {y}_ i- \bX _ i \bbeta ||^2 + \lambda _1 \sum _{j=1}^{m} |\beta _ j |  \]

Note that, as discussed in the section Elastic Net Selection (ELASTICNET), the elastic net method can be solved in the same way as LASSO, by augmenting the design matrix $\bX _ i$ and the response $\mb {y}_ i$.

Following the piecewise linear solution path of LASSO, the coefficients

\[  \bbeta ^{i1}, \bbeta ^{i2}, \ldots , \bbeta ^{ip},\ldots  \]

are computed to correspond to the LASSO regularization parameter

\[  \lambda _1^{i1}, \lambda _1^{i2}, \ldots , \lambda _1^{ip}, \ldots  \]

Based on the computed coefficients, the predicted residual sum of squares is computed on the held-out part $\bX _{-i}$ and $\mb {y}_{-i}$ as

\[  r^{ip} = ||\mb {y}_{-i}- \bX _{-i} \bbeta ^{ip}||^2  \]

The preceding process can be summarized as

\[  (\bX _ i, \mb {y}_ i) \rightarrow (\lambda _1^{ip}, \bbeta ^{ip}) \rightarrow (\lambda _1^{ip}, r^{ip}), i=1,2, \ldots , k, p=1,2, \ldots  \]

For the illustration in Figure 47.19, the results $(\lambda _1^{ip}, r^{ip})$ correspond to four curves. The knots $\lambda _1^{i1}, \lambda _1^{i2}, \ldots , \lambda _1^{ip}, \ldots $ are usually different among different model fits $i=1,2, \ldots , k$. To merge the results of the $k$ model fits for computing the CVEXPRESS statistic, perform the following three steps:


Identify distinct knots among all $\lambda _1^{ip}, i=1,2, \ldots , k, p=1,2, \ldots $


Apply interpolation to compute the predicted residual sum of squares at all the knots. Here the interpolation is done in a closed quadratic function by using the piecewise linear solutions of LASSO.


Add the predicted residual sum of squares of $k$ model fits according to the identified knots, and the sum of the k predicted residual sum of squares so obtained is the estimate of the prediction error that is denoted by CVEXPRESS.

The bottom curve in Figure 47.19 illustrates the curve between the CVEXPRESS statistics and the value of $\lambda _1$ (L1). Note that computing the CVEXPRESS statistic for k-fold external cross validation requires fitting k different LASSO or elastic net models, and so the work and memory requirements increase linearly with the number of cross validation folds.

In addition to characterizing the piecewise linear solutions of the coefficients $\bbeta ^{i1}, \bbeta ^{i2}, \ldots , \bbeta ^{ip},\ldots $ by the LASSO regularization parameters $\lambda _1^{i1}, \lambda _1^{i2}, \ldots , \lambda _1^{ip}, \ldots $, you can also characterize the solutions by the sum of the absolute values of the coefficients or the scaled regularization parameter. For a detailed discussion of the different options, see the L1CHOICE= option in the MODEL statement.

Like k-fold cross validation, you can use the CVMETHOD= option in the MODEL statement to specify the method for splitting the data into k parts in k-fold external cross validation. CVMETHOD=BLOCK(k) requests that the k parts be made of blocks of $\mbox{floor} (n/k)$ or $\mbox{floor} (n/k) +1$ successive observations, where n is the number of observations. CVMETHOD=SPLIT(k) requests that parts consist of observations $\{ 1,k+1,2k+1,3k+1,\ldots \} $, $\{ 2,k+2,2k+2,3k+2,\ldots \} $, . . . , $\{ k,2k,3k,\ldots \} $. CVMETHOD=RANDOM(k) partitions the data into random subsets, each with approximately $\mbox{floor} (n/k)$ observations. Finally, you can use the formatted value of an input data set variable to define the parts by specifying CVMETHOD=variable. This last partitioning method is useful in cases where you need to exercise extra control over how the data are partitioned by taking into account factors such as important but rare observations that you want to spread out across the various parts. By default, PROC GLMSELECT uses CVMETHOD=RANDOM(5) for external cross validation.

For the elastic net method, if the ridge regression parameter $\lambda _2$ is not specified by the L2= option and you use k-fold external cross validation for the CHOOSE= option, then the optimal $\lambda _2$ is searched over an interval (see Figure 47.12 for an illustration) and it is set to the value that achieves the minimum CVEXPRESS statistic. You can use the L2SEARCH=, L2LOW=, L2HIGH=, and L2STEPS= options to control the search of $\lambda _2$ (L2).

Difference between Cross Validation and External Cross Validation

If you specify SELECTION=LASSO or SELECTION=ELASTICNET, the penalized model is fit only once using the same training samples in k-fold cross validation, whereas the penalized model is fit $k$ times by using different training samples in k-fold external cross validation. External cross validation also requires identifying the knots that result from the different solution paths. The CVPRESS statistic that is computed in k-fold cross validation is based on ordinary least squares regression, whereas the CVEXPRESS statistic that is computed in k-fold external cross validation is based on the penalized regression.

Using External Cross Validation as the CHOOSE= Criterion

When you specify the CHOOSE=CVEX suboption of the SELECTION= option in the MODEL statement, the CVEXPRESS statistics are computed for the models at each step of the selection process. The model at the first step that has the smallest CVEXPRESS score is selected.