The TPSPLINE Procedure

PROC TPSPLINE Statement

  • PROC TPSPLINE <options>;

The PROC TPSPLINE statement invokes the TPSPLINE procedure. Table 103.1 summarizes the options available in the TPSPLINE statement.

Table 103.1: PROC TPSPLINE Statement Options

Option

Description

DATA=

Specifies the SAS data set to be read

PLOTS

Controls the plots that are produced through ODS Graphics


You can specify the following options:

DATA=SAS-data-set

specifies the SAS data set to be read by PROC TPSPLINE. The default value is the most recently created data set.

PLOTS <(global-plot-options)> <= plot-request<(options)>>
PLOTS <(global-plot-options)> <= (plot-request<(options)> <…plot-request<(options)>>)>

controls the plots that are produced through ODS Graphics. When you specify only one plot request, you can omit the parentheses around the plot request. Here are some examples:

plots=none
plots=residuals(smooth)
plots(unpack)=diagnostics
plots(only)=(fit residualHistogram)

ODS Graphics must be enabled before plots can be requested. For example:

ods graphics on;

proc tpspline;
   model y = (x);
run;

ods graphics off;

For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 21: Statistical Graphics Using ODS.

If ODS Graphics is enabled but you do not specify the PLOTS= option, then PROC TPSPLINE produces a default set of plots. The following table lists the plots that are produced.

Table 103.2: Graphs Produced

Plot

Conditional on:

ContourFitPanel

LAMBDA= or LOGNLAMBDA= option specified in the MODEL statement

ContourFit

Model with two predictors

CriterionPlot

Multiple values for the smoothing parameter

DiagnosticsPanel

Unconditional

ResidualBySmooth

LAMBDA= or LOGNLAMBDA= option specified in the MODEL statement

ResidualPanel

Unconditional

FitPanel

LAMBDA= or LOGNLAMBDA= option specified in the MODEL statement

FitPlot

Model with one predictor

ScorePlot

One or more SCORE statements and a model with one predictor


For models with multiple dependent variables, separate plots are produced for each dependent variable. For models in which multiple smoothing parameters are specified with the LAMBDA= or LOGNLAMBDA= option in the MODEL statement, the plots are produced for the selected model only.

Global Plot Options

The global-plot-options apply to all relevant plots generated by the TPSPLINE procedure, unless they are overridden by a specific-plot-option. The following global-plot-options are supported by the TPSPLINE procedure:

ONLY

suppresses the default plots. Only the plots specifically requested are produced.

UNPACK

suppresses paneling. By default, multiple plots can appear in some output panels. Specify UNPACK to get each plot individually. You can specify PLOTS(UNPACK) to unpack the default plots. You can also specify UNPACK as a suboption with the CONTOURFITPANEL, DIAGNOSTICS, FITPANEL, RESIDUALS and RESIDUALSBYSMOOTH options.

Plot Requests

You can specify the following specific plot-requests and controls for them:

ALL

produces all plots appropriate for the particular analysis. You can specify other options with ALL; for example, to request that all plots be produced and that only the residual plots be unpacked, specify PLOTS=(ALL RESIDUALS(UNPACK)).

CONTOURFIT <(OBS=contour-options)>

produces a contour plot of the fitted surface overlaid with a scatter plot of the data for models with two predictors. You can use the following contour-options to control how the observations are displayed:

GRADIENT

displays observations as circles colored by the observed response. The same color gradient is used to display the fitted surface and the observations. Observations where the predicted response is close to the observed response have similar colors—the greater the contrast between the color of an observation and the surface, the larger the residual is at that point. OBS=GRADIENT is the default if you do not specify any contour-options.

NONE

suppresses the observations.

OUTLINE

displays observations as circles with a border but with a completely transparent fill.

OUTLINEGRADIENT

is the same as OBS=GRADIENT except that a border is shown around each observation. This option is useful for identifying the location of observations where the residuals are small, because at these points the color of the observations and the color of the surface are indistinguishable.

CONTOURFITPANEL <(options)>

produces panels of contour plots overlaid with a scatter plot of the data for each smoothing parameter specified in the LAMBDA= or LOGNLAMBDA= option in the MODEL statement, for models with two predictors. If you do not specify the LAMBDA= or LOGNLAMBDA= option or if the model does not have two predictors, then this plot is not produced. Each panel contains at most six plots, and multiple panels are used when there are more than six smoothing parameters in the LAMBDA= or LOGNLAMBDA= option. The following options are available:

OBS=contour-options

specifies how the observations are displayed. See contour-options for the CONTOURFIT option for details.

UNPACK

suppresses paneling.

CRITERIONPLOT | CRITERION <(NOPATH)>

displays a scatter plot of the value of the GCV criterion versus the smoothing parameter value for all smoothing parameter values examined in the selection process. This plot is not produced when you specify one smoothing parameter with either the LAMBDA0= or LOGNLAMBDA0= option in the MODEL statement. When you supply a list of values for the smoothing parameter with the LAMBDA= or LOGNLAMBDA= option and PROC TPSPLINE obtains the optimal smoothing parameter by minimizing the GCV criterion, then the plot contains the supplied list of smoothing values and the optimal smoothing parameter in addition to the values examined during the optimization process. You can use the NOPATH suboption to disable the display of the optimization path in the plot in this case.

DIAGNOSTICSPANEL | DIAGNOSTICS <(UNPACK)>

produces a summary panel of fit diagnostics that consists of the following:

  • residuals versus the predicted values

  • a histogram of the residuals

  • a normal quantile plot of the residuals

  • a "Residual-Fit" (RF) plot that consists of side-by-side quantile plots of the centered fit and the residuals

  • response values versus the predicted values

You can request the five plots in this panel as individual plots by specifying the UNPACK option. You can also request individual plots in the panel by name without having to unpack the panel. The fit diagnostics panel is produced by default whenever ODS Graphics is enabled.

FITPANEL <(options)>

produces panels of plots that show the fitted TPSPLINE curve overlaid on a scatter plot of the input data for each smoothing parameter specified in the LAMBDA= or LOGNLAMBDA= option in the MODEL statement. If you do not specify the LAMBDA= or LOGNLAMBDA= option or the model has more than one predictor, then this plot is not produced. Each panel contains at most six plots, and multiple panels are used when there are more than six smoothing parameters in the LAMBDA= or LOGNLAMBDA= option. The following options are available:

CLM

includes a confidence band at the significance level specified in the ALPHA= option in the MODEL statement in each plot in the panels.

UNPACK

suppresses paneling.

FITPLOT | FIT <(CLM)>

produces a scatter plot of the input data with the fitted TPSPLINE curve overlaid for models with a single predictor. If the CLM option is specified, then a confidence band at the significance level specified in the ALPHA= option in the MODEL statement is included in the plot.

NONE

suppresses all plots.

OBSERVEDBYPREDICTED

produces a scatter plot of the dependent variable values by the predicted values.

QQPLOT | QQ

produces a normal quantile plot of the residuals.

RESIDUALBYSMOOTH <(SMOOTH)>

produces, for each predictor, panels of plots that show the residuals of the TPSPLINE fit versus the predictor for each smoothing parameter specified in the LAMBDA= or LOGNLAMBDA= option in the MODEL statement. If you do not specify the LAMBDA= or LOGNLAMBDA= option, then this plot is not produced. Each panel contains at most six plots, and multiple panels are used when there are more than six smoothing parameters in the LAMBDA= or LOGNLAMBDA= option in the MODEL statement. The SMOOTH option displays a nonparametric fit line in each plot in the panel. The type of nonparametric fit and the options used are controlled by the underlying template for this plot. In the standard template that is provided, the nonparametric smooth is specified to be a loess fit that corresponds to the default options of PROC LOESS, except that the PRESEARCH suboption in the SELECT statement is always used. It is important to note that the loess fit that is shown in each of the residual plots is computed independently of the smoothing spline fit that is used to obtain the residuals.

RESIDUALBYPREDICTED

produces a scatter plot of the residuals by the predicted values.

RESIDUALHISTOGRAM

produces a histogram of the residuals.

RESIDUALPANEL | RESIDUALS  <(options )>

produces panels of the residuals versus the predictors in the model. Each panel contains at most six plots, and multiple panels are used when there are more than six predictors in the model.

The following options are available:

SMOOTH

displays a nonparametric fit line in each plot in the panel. The type of nonparametric fit and the options used are controlled by the underlying template for this plot. In the standard template that is provided, the nonparametric smooth is specified to be a loess fit that corresponds to the default options of PROC LOESS, except that the PRESEARCH suboption in the SELECT statement is always used. It is important to note that the loess fit that is shown in each of the residual plots is computed independently of the smoothing spline fit that is used to obtain the residuals.

UNPACK

suppresses paneling.

RFPLOT | RF

produces a "Residual-Fit" (RF) plot that consists of side-by-side quantile plots of the centered fit and the residuals. This plot "shows how much variation in the data is explained by the fit and how much remains in the residuals" (Cleveland, 1993).

SCOREPLOT | SCORE

produces a scatter plot of the scored values at the score points for each SCORE statement. SCORE plots are not produced for models with more than one predictor.