FOCUS AREAS

SAS/STAT Topics

SAS/STAT Software

Nonparametric Regression

Nonparametric regression relaxes the usual assumption of linearity and enables you to uncover relationships between the independent variables and the dependent variable that might otherwise be missed.

The SAS/STAT nonparametric regression procedures include the following:

ADAPTIVEREG Procedure


The ADAPTIVEREG procedure fits multivariate adaptive regression splines. The method is a nonparametric regression technique that combines both regression splines and model selection methods. It does not assume parametric model forms and does not require specification of knot values for constructing regression spline terms. Instead, it constructs spline basis functions in an adaptive way by automatically selecting appropriate knot values for different variables and obtains reduced models by applying model selection techniques. The procedure enables you to do the following:

  • specify classification variables with ordering options
  • partition your data into training, validation, and testing roles
  • specify the distribution family used in the model
  • specify the link function in the model
  • specify an offset
  • specify the maximum number of basis functions that can be used in the final model
  • specify the maximum interaction levels for effects that could potentially enter the model
  • specify the incremental penalty for increasing the number of variables in the model
  • specify the effects to be included in the final model
  • request an additive model for which only main effects are included in the fitted model
  • specify the parameter that controls the number of knots considered for each variable
  • force effects in the final model or restrict variables in linear forms
  • specify options for fast forward selection
  • perform leave-one-out and k-fold cross validation
  • produce a graphical representation of the selection process, model fit, functional components, and fit diagnostics
  • create an output data set that contains predicted values and residuals
  • create an output data set that contains the design matrix of formed basis functions
  • specify multiple SCORE statements, which create new SAS data sets that contain predicted values and residuals
  • perform BY group processing to obtain separate analyses on grouped observations
  • automatically produce graphs by using ODS Graphics
For further details, see ADAPTIVEREG Procedure

GAM Procedure


The GAM procedure fits generalized additive models as those models are defined by Hastie and Tibshirani (1990). This procedure provides powerful tools for nonparametric regression and smoothing. Nonparametric regression relaxes the usual assumption of linearity and enables you to uncover relationships between the independent variables and the dependent variable that might otherwise be missed. The generalized additive models fit by the GAM procedure combine an additivity assumption (Stone 1985) that enables relatively many nonparametric relationships to be explored simultaneously and the distributional flexibility of generalized linear models (Nelder and Wedderburn 1972). The following are highlights of the procedure's features:

  • permits the following smoothing effects:
    • smoothing spline (SPLINE)
    • local regression (LOESS)
    • bivariate thin-plate smoothing spline (SPLINE2)
  • supports the following distributions families for the response variables:
    • gaussian (continuous response variables)
    • binomial (binary response variables)
    • Poisson (nonnegative discrete response variables)
    • gamma (positive continuous response variables)
    • inverse gaussian (positive continuous response variables)
  • supports the use of multidimensional data
  • fits both generalized semiparametric additive models and generalized additive models
  • enables you to choose a particular model by specifying the model degrees of freedom or smoothing parameter
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • scores new data sets
  • creates an output data set that contains diagnostic measures
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see GAM Procedure

GAMPL Procedure


The GAMPL procedure is a high-performance procedure that fits generalized additive models that are based on low-rank regression splines. This procedure provides powerful tools for nonparametric regression and smoothing.

Generalized additive models are extensions of generalized linear models. They relax the linearity assumption in generalized linear models by allowing spline terms in order to characterize nonlinear dependency structures. Each spline term is constructed by the thin-plate regression spline technique. A roughness penalty is applied to each spline term by a smoothing parameter that controls the balance between goodness of fit and the roughness of the spline curve. PROC GAMPL fits models for standard distributions in the exponential family, such as normal, Poisson, and gamma distributions.

PROC GAMPL runs in either single-machine mode or distributed mode.

  • estimates the regression parameters of a generalized additive model that has fixed smoothing parameters by using penalized likelihood estimation
  • estimates the smoothing parameters of a generalized additive model by using either the performance iteration method or the outer iteration method
  • estimates the regression parameters of a generalized linear model by using maximum likelihood techniques
  • tests the total contribution of each spline term based on the Wald statistic
  • provides model-building syntax in the CLASS statement and effect-based parametric effects in the MODEL statement, which are used in other SAS/STAT analytic procedures (in particular, the GLM, LOGISTIC, GLIMMIX, and MIXED procedures)
  • provides response-variable options
  • enables you to construct a spline term by using multiple variables
  • provides control options for constructing a spline term, such as fixed degrees of freedom, initial smoothing parameter, fixed smoothing parameter, smoothing parameter search range, user-supplied knot values, and so on
  • provides multiple link functions for any distribution
  • provides a WEIGHT statement for weighted analysis
  • provides a FREQ statement for grouped analysis
  • provides an OUTPUT statement to produce a data set that has predicted values and other observationwise statistics
  • produces graphs by using ODS Graphics
  • enables you to run in distributed mode on a cluster of machines that distribute the data and the computations
  • enables you to run in single-machine mode on the server where SAS is installed
  • exploits all the available cores and concurrent threads, regardless of execution mode
For further details, see GAMPL Procedure

LOESS Procedure


The LOESS procedure implements a nonparametric method for estimating regression surfaces. PROC LOESS allows great flexibility because no assumptions about the parametric form of the regression surface are needed. The following are highlights of the LOESS procedure's features:

  • supports the use of multidimensional data
  • supports multiple dependent variables
  • supports both direct and interpolated fitting that uses kd trees
  • performs statistical inference
  • performs automatic smoothing parameter selection
  • performs iterative reweighting to provide robust fitting when there are outliers in the data
  • scores external data sets
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • performs weighted estimation
  • creates a SAS data set that contains the predicted values and other requested statistics
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see LOESS Procedure

TPSPLINE Procedure


The TPSPLINE procedure uses the penalized least squares method to fit a nonparametric regression model. It computes thin-plate smoothing splines to approximate smooth multivariate functions observed with noise. The TPSPLINE procedure allows great flexibility in the possible form of the regression surface. In particular, PROC TPSPLINE makes no assumptions of a parametric form for the model. The following are highlights of the TPSPLINE procedure's features:

  • supports the use of multidimensional data
  • supports multiple SCORE statements
  • fits both semiparametric models and nonparametric models
  • provides options for handling large data sets
  • supports multiple dependent variables
  • enables you to choose a particular model by specifying the model degrees of freedom or smoothing parameter
  • performs BY group processing, which enables you to obtain separate analysis on grouped observations
  • creates a SAS data set that corresponds to any output table
  • automatically creates graphs by using ODS Graphics
For further details, see TPSPLINE Procedure