Example Program and Statement Details

Example Graph

The following graph was generated by the Example Program:
Example PBSpline Plot

Example Program

proc template;
 define statgraph pbsplineplot;
  begingraph;
    entrytitle "Spline Fit";
    layout overlay;
      scatterplot x=weight y=mpg_highway /
        datatransparency=.7;
      pbsplineplot x=weight y=mpg_highway / name="fitline"
        alpha=.05 legendlabel="Spline Fit";
      discretelegend "fitline";
    endlayout;
  endgraph;
 end;
run;

proc sgrender data=sashelp.cars template=pbsplineplot;
run;

Statement Summary

The PBSPLINEPLOT statement only supports models of one independent and one dependent variable. For more information about the fitting methodology, see the TRANSREG procedure in the SAS/STAT user’s guide.
In addition to the penalized B-spline, the PBSPLINEPLOT statement can compute confidence levels for the fitted line. To display the confidence levels,
  1. use the CLI= or CLM= option to declare a name for the confidence level
  2. use a MODELBAND statement to refer to this name. This statement draws a confidence band from this information. See MODELBAND Statement for information about how to control the appearance of the confidence band.

Required Arguments

X=numeric-column | expression
specifies the column for the X values.
Y=numeric-column | expression
specifies the column for the Y values.

PBSPLINE Regression Options

ALPHA= positive-number
specifies the confidence level to compute.
Default: .05
Range: 0 - 1
ALPHA=.05 represents a 95% confidence level.
CLI= "name"
produces confidence limits for individual predicted values for each observation.
Default: no default
Interaction: name is a unique name within the template that is case sensitive and cannot contain spaces. It must be assigned in order for the confidence limits to be computed. To display confidence limits, you must use this name as the required argument of a MODELBAND statement. See the example in the section Example Program and Statement Details.
CLM= "name"
produces confidence limits for a mean predicted value for each observation.
Default: no default
Interaction: name is a unique name within the template that is case sensitive and cannot contain spaces. It must be assigned in order for the confidence limits to be computed. To display confidence limits, you must use this name as the required argument of a MODELBAND statement. See the example in the section Example Program and Statement Details.
DEGREE= non-negative-integer
specifies the degree of B-spline.
Default: 3
FREQ= numeric-column
specifies a variable in the input data set that represents the frequency of occurrence of the current observation, essentially treating the data set as if each observation appeared n times, where n is the value of the FREQ variable for the observation. Noninteger values of the FREQ variable are truncated to the largest integer less than the FREQ value. The observation is used in the analysis only if the value of the FREQ variable is greater than or equal to 1.
Default: no default
MAXPOINTS= positive-integer
specifies the maximum number of predicted points generated for the spline curve as well as any confidence limits.
Default: 201
NKNOTS= non-negative-integer
specifies the number of evenly spaced internal knots.
Default: 100
By default, a large number of knots (100) is specified, which allows for an extreme lack of smoothness in the results. However, the final function is typically much smoother due to the penalty. See the section and example on “Penalized B-Splines” in PROC TRANSREG in the SAS/STAT user’s guide. When SMOOTH=0 is specified, you should typically ask for many fewer knots than the default, since there is no penalty for lack of smoothness. For example, ten or fewer knots is usually enough to follow the functional form found in most data. See “Using Splines and Knots” and “Specifying the Number of Knots” in PROC TRANSREG.
SMOOTH=AUTO | non-negative-number
specifies a regression parameter value.
Default: AUTO
With SMOOTH=AUTO, a regression parameter is automatically selected that minimizes a lack-of-smoothness penalty.
You can specify SMOOTH=0 to get an ordinary B-spline fit.
WEIGHT= numeric-column
specifies a variable in the input data set that contains values to be used as a priori weights for a penalized B-spline fit. If an observation’s weight is zero, negative, or missing, the observation is deleted from the analysis.
Default: no default

Options

Statement Option
Description
Specifies the label of the regression curve.
Specifies the color and font attributes of the regression curve label.
Specifies the location of the regression curve label relative to the plot area.
Specifies the position of the regression curve label relative to the curve line.
Specifies the degree of the transparency of the regression curve.
Creates a distinct set of lines from just the observations that correspond to each unique group value of the specified column.
Specifies whether missing values of the group variable are included in the plot.
Specifies indices for mapping line attributes (color and pattern) to one of the GraphData1 - GranphDataN style elements.
Specifies a label for a legend.
Specifies the line properties of the regression curve.
Assigns a name to a plot statement for reference in other template statements.
Specifies that the data columns for this plot be used for determining default axis features.
Specifies display formats for information defined by roles.
Specifies display labels for information defined by roles.
Specifies whether data are mapped to the primary X (bottom) axis or the secondary X2 (top) axis.
Specifies whether data are mapped to the primary Y (left) axis or the secondary Y2 (right) axis.
CURVELABEL="string"
specifies a label for the regression curve.
Default: no curve label is displayed
Interaction: If the GROUP= option is specified, this option is ignored.
The font and color attributes for the label are specified by the CURVELABELATTRS= option.
CURVELABELATTRS=style-element | style-element (text-options) | (text-options)
specifies the color and font attributes of the regression curve labels. See General Syntax for Attribute Options for the syntax on using a style-element and Text Options for available text-options.
Default: The GraphValueText style element.
Interaction: For this option to take effect, the CURVELABEL= option must also be specified.
Interaction: If the GROUP= option is specified, this option is ignored.
CURVELABELLOCATION=INSIDE | OUTSIDE
specifies the location of the regression curve label relative to the plot area.
Default: INSIDE
INSIDE
inside the plot area
OUTSIDE
outside the plot area
Restriction: OUTSIDE cannot be used when the PBSPLINEPLOT is used in multicell layouts such as LATTICE, DATAPANEL, or DATALATTICE, where axes might be external to the grid.
Interaction: For this option to take effect, the CURVELABEL= option must also be specified.
Interaction: This option is used in conjunction with the CURVELABELPOSITION= option to determine where the curve labels appear. For more information, see Location and Position of Curve Labels.
CURVELABELPOSITION=AUTO | MAX | MIN | START | END
specifies the position of the regression curve label relative to the curve line.
Default: AUTO when CUVELABELLOCATION=OUTSIDE. END when CURVELABELLOCATION=INSIDE
AUTO
Only used when CURVELABELPOSITION=OUTSIDE. The regression curve label is positioned automatically near the curve boundary along unused axes whenever possible (typically Y2 and X2) to avoid collision with tick values.
MAX
Forces the regression curve label to appear near maximum curve values (typically, upper right)
MIN
Forces the regression curve label to appear near minimum curve values (typically, lower left)
START
Only used when CURVELABELLOCATION=INSIDE. Forces the regression curve label to appear near the beginning of the curve. Particularly useful when the curve line has a spiral shape.
END
Only used when CURVELABELLOCATION=INSIDE. Forces the regression curve label to appear near the end of the curve. Particularly useful when the curve line has a spiral shape.
Interaction: For this option to take effect, the CURVELABEL= option must also be specified.
Interaction: The AUTO setting is ignored if CURVELABELLOCATION=INSIDE is specified. The START and END settings are ignored if CURVELABELLOCATION=OUTSIDE is specified.
Interaction: This option is used in conjunction with the CURVELABELLOCATION= option to determine where the regression curve label appears. For more information, see Location and Position of Curve Labels.
DATATRANSPARENCY=number
specifies the degree of the transparency of the curve.
Default: 0
Range: 0 (opaque) to 1 (entirely transparent)
GROUP=column | discrete-attr-var | expression
creates a distinct set of curves from just the observations that correspond to each unique group value of the specified column.
discrete-attr-var
specifies a discrete attribute variable that is defined in a DISCRETEATTRVAR statement.
Restriction: A discrete attribute variable specification must be a direct reference to the attribute variable. It cannot be set by a dynamic variable.
Default: Each distinct group value might be represented in the graph by a different combination of color and line pattern. Line colors vary according to the GraphData1:ContrastColor - GraphDataN:ContrastColor style references, and line patterns vary according to the GraphData1:LineStyle - GraphDataN:LineStyle style references.
Restriction: The input data must be sorted by the GROUP= column.
Interaction: The group values are mapped in the order of the data, unless the INDEX= option is used to alter the default sequence of line colors and line patterns.
Interaction: The INCLUDEMISSINGGROUP option controls whether missing group values are considered a distinct group value.
Tip: The LINEATTRS= option can be used to override the representations that are used to identify the groups. For example, LINEATTRS=(PATTERN=SOLID) can be used to assign the same pattern to all of the loess curves, letting the line color distinguish group values. Likewise, LINEATTRS=(COLOR=BLACK) can be used to assign the same color to all of the curves, letting the line pattern distinguish group values.
INCLUDEMISSINGGROUP=boolean
specifies whether missing values of the group variable are included in the plot.
Default: TRUE
Interaction: For this option to take effect, the GROUP= option must also be specified.
Tip: Unless a discrete attribute map is in effect or the INDEX= option is used, the attributes of the missing group value are determined by the GraphMissing style element except when the MISSING= system option is used to specify a non-default missing character or when a user-defined format is applied to the missing group value. In those cases, the attributes of the missing group value are determined by a GraphData1–GraphDataN style element.
INDEX=numeric-column | expression
specifies indices for mapping line attributes (color and pattern) to one of the GraphData1 - GranphDataN style elements.
Default: no default
Restriction: If the value of the numeric-column is missing or is less than 1, the observation is not used in the analysis. If the value is not an integer, only the integer portion is used.
Interaction: For this option to take effect, the GROUP= option must also be specified.
Interaction: All of the indexes for a specific group value must be the same. Otherwise, the results are unpredictable.
Interaction: If this option is not used, then the group values are mapped in the order of the data.
Interaction: The index values are 1-based indices. For the style elements in GraphData1 - GraphDataN, if the index value is greater than N, then a modulo operation remaps that index value to a number less than N to determine which style element to use.
Discussion: Indexing can be used to collapse the number of groups that are represented in a graph. For more information, see Remapping Groups for Grouped Data.
LEGENDLABEL= "string"
specifies a label for the legend item that is associated with this plot.
Default: The string specified on the NAME= option.
Restriction: This option applies only to an associated DISCRETELEGEND statement.
Interaction: If the GROUP= option is specified, this option is ignored.
LINEATTRS=style-element | style-element (line-options) | (line-options)
specifies the line attributes of the regression curve. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphFit style element.
NAME="string"
assigns a name to a plot statement for reference in other template statements.
Default: no default
Restriction: The string is case sensitive, cannot contain spaces, and must define a unique name within the template.
Interaction: The string is used as the default legend label if the LEGENDLABEL= option is not used.
The specified name is used primarily in legend statements to coordinate the use of colors and line patterns between the graph and the legend.
PRIMARY=boolean
specifies that the data columns for this plot and the plot type be used for determining default axis features.
Default: FALSE
Restriction: This option is ignored if the plot is placed under a GRIDDED or LATTICE layout block.
Details: This option is needed only when two or more plots within an overlay-type layout contribute to a common axis. For more information, see When Plots Share Data and a Common Axis.
TIPFORMAT=(role-format-list)
specifies display formats for tip columns.
Default: The column format of the variable assigned to the role or BEST6. if no format is assigned to a numeric column.
(role-format-list)
a list of role-name = format pairs separated by blanks.
TIPFORMAT=(Y=6.2)
Requirement: To generate tooltips, you must include an ODS GRAPHICS ON statement that has the IMAGEMAP option specified, and write the graphs to the ODS HTML destination.
The columns assigned to the X, Y, and GROUP (if assigned) roles are automatically included in the tooltip information.
TIPLABEL=(role-label-list)
specifies display labels for tip columns.
Default: The column label or column name of the variable assigned to the role.
(role-label-list)
a list of role-name = "string" pairs separated by blanks.
   TIPLABEL=(Y="Spline Regression")
Requirement: To generate tooltips, you must include an ODS GRAPHICS ON statement that has the IMAGEMAP option specified, and write the graphs to the ODS HTML destination.
The columns assigned to the X, Y, and GROUP (if assigned) roles are automatically included in the tooltip information.
XAXIS=X | X2
specifies whether data are mapped to the primary X (bottom) axis or to the secondary X2 (top) axis.
Default: X
Interaction: The overall plot specification and the layout type determine the axis display. For more information, see How Axis Features Are Determined.
YAXIS=Y | Y2
specifies whether data are mapped to the primary Y (left) axis or to the secondary Y2 (right) axis.
Default: Y
Interaction: The overall plot specification and the layout type determine the axis display. For more information, see How Axis Features Are Determined.