The ADAPTIVEREG procedure fits multivariate adaptive regression splines as defined by Friedman (1991b). The method is a nonparametric regression technique that combines both regression splines and model selection methods. It does not assume parametric model forms and does not require specification of knot values for constructing regression spline terms. Instead, it constructs spline basis functions in an adaptive way by automatically selecting appropriate knot values for different variables and obtains reduced models by applying model selection techniques.
PROC ADAPTIVEREG supports models with classification variables (Friedman, 1991a) and offers options for improving modeling speed (Friedman, 1993). PROC ADAPTIVEREG also extends the method to data with response variables that are distributed in the exponential family as suggested in Buja et al. (1991). The procedure can take advantage of multicore processors to distribute the computation to multiple threads.
SAS/STAT software offers various tools for nonparametric regression, including the GAM, LOESS, and TPSPLINE procedures. Typical nonparametric regression methods involve a large number of parameters in order to capture nonlinear trends in data; thus the model space is much larger than it is in more restricted parametric models. The fitting algorithms for nonparametric regression models are usually more complicated than for parametric regression models. Also, the sparsity of data in high dimensions often causes slow convergence or failure in many nonparametric regression methods. As the number of predictors increases, the model variance increases rapidly because of the sparsity. This phenomenon is referred as the “curse of dimensionality” (Bellman, 1961). Hence, the LOESS and TPSPLINE procedures are limited to problems in low dimensions. PROC GAM fits generalized additive models with the additivity assumption. By using the local scoring algorithm (Hastie and Tibshirani, 1990), PROC GAM can handle larger data sets than the other two procedures. However, the computation time for the local scoring algorithm to converge increases rapidly as data size grows, and the convergence for nonnormal distributions is not guaranteed. PROC ADAPTIVEREG uses the multivariate adaptive regression splines method, which is similar to the method used for the recursive partitioning models (Breiman et al., 1984). It creates an overfitted model first with the fast-update algorithm (Friedman, 1991b); then prunes it back with the backward selection technique.
The main features of the ADAPTIVEREG procedure are as follows:
supports classification variables with ordering options
enables you to force effects in the final model or restrict variables in linear forms
supports options for fast forward selection
supports data with response variables that are distributed in the exponential family
supports partitioning of data into training, validation, and testing roles
provides leave-one-out and k-fold cross validation
produces a graphical representation of the selection process, model fit, functional components, and fit diagnostics
produces an output data set that contains predicted values and residuals
produces an output data set that contains the design matrix of formed basis functions
supports multiple SCORE statements