SAS Institute. The Power to Know

FOCUS AREAS

Nonparametric Modeling

SAS/STAT software now provides procedures for nonparametric density estimation and nonparametric regression, two of the new directions in which statistical software is being developed for Version 8. These procedures are preliminary steps toward comprehensive support for modern nonparametric methods within the SAS System. It is anticipated that the coverage described here will expand to include a variety of other important methods. Some of the techniques provided by the new procedures are also being implemented as functions in SAS/IML software and with interactive graphics in SAS/INSIGHT software.

Nonparametric Density Estimation: The KDE Procedure

The KDE procedure computes nonparametric estimates of univariate and bivariate probability density functions using the method of kernel density estimation. The procedure saves the density estimate in a SAS data set for subsequent plotting or analysis. An important issue in the application of kernel density estimation is the choice of bandwidth, and the procedure provides several methods for automatic bandwidth selection, including the method provided by Silverman (1986) and the more recent SJPI method. In the bivariate case, the procedure enables you to adjust the two bandwidths and also computes contours of the estimated density function.

Nonparametric Regression: The LOESS and TPSPLINE Procedures

The LOESS and TPSPLINE procedures complement the methods provided in standard SAS regression procedures such as the GLM, REG, and NLIN procedures. The standard procedures can handle most situations in which you can specify a parametric regression model and the model is known up to a finite number of parameters. However, when you have no prior knowledge about the model or knows that the data cannot be represented by a model with a finite number of parameters, nonparametric regression can be used to explore the data.

The LOESS procedure implements a nonparametric method for estimating local regression surfaces that allows great flexibility because it requires no assumptions about the parametric form of the regression surface. The LOESS procedure fits nonparametric models and supports the use of multidimensional data, multiple dependent variables, and both direct and interpolated fitting using kd trees. You can also use the LOESS procedure to perform statistical inference provided the error distribution satisfies some basic assumptions, such as when the error distribution is normal with mean 0. By using iterative reweighting, the LOESS procedure can provide statistical inference when the error distribution is symmetric but not necessarily normal and perform robust fitting in the presence of outliers in the data.

The TPSPLINE procedure uses a penalized least squares method to estimate multivariate regression surfaces with thin-plate smoothing splines. The TPSPLINE procedure allows great flexibility in the form of the regression surface and requires no assumptions of a parametric form for the model. You can use the TPSPLINE procedure to fit either a nonparametric model or a semiparametric model. The generalized cross validation (GCV) function is used to select the smoothing parameter.

Documentation

For detailed documentation on the nonparametric modeling procedures, refer to the chapters "The KDE Procedure," "The LOESS Procedure," and "The TPSPLINE Procedure" in the SAS/STAT User's Guide, Version 8. In addition, refer to the SUGI 24 paper An Introduction to PROC LOESS for Local Regression by Robert Cohen.


Statistics and Operations Research Home Page | What's New in Data Analysis