Example Program and Statement Details

Example Graph

The following graph was generated by the Example Program:
Example Regression Plot

Example Program

proc template;
  define statgraph regressionplot;
    begingraph; 
      entrytitle "Regression Fit Plot";
      layout overlay;      
        scatterplot x=weight y=mpg_highway /
          datatransparency=.7;
        regressionplot x=weight y=mpg_highway /
          name="fitline"
          alpha=.05 legendlabel="Regression Fit";
        discretelegend "fitline";
      endlayout;
   endgraph;
  end;
run;

proc sgrender data=sashelp.cars template=regressionplot;
run;

Statement Summary

The REGRESSIONPLOT statement only supports models of one independent and one dependent variable. For more information about the fitting methodology, see the TRANSREG procedure in the SAS/STAT user’s guide.
In addition to the regression line, the REGRESSIONPLOT statement can compute confidence levels for the fitted line. To display the confidence levels,
  1. use the CLI= or CLM= regression option(s) to declare a name for each confidence level
  2. use MODELBAND statements to refer to the name(s) and draw a confidence band(s) from this information.

Required Arguments

X=numeric-column | expression
specifies the column for the X values.
Y=numeric-column | expression
specifies the column for the Y values.

Regression Options

ALPHA= positive-number
specifies the confidence level to compute.
Default: .05
Range: 0–1
ALPHA=.05 represents a 95% confidence level.
CLI= "name"
produces confidence limits for individual predicted values for each observation. The confidence level is set by the ALPHA= option.
Default: no default
Interaction: name is a unique name within the template that is case sensitive and cannot contain spaces. It must be assigned in order for the confidence limits to be computed. To display confidence limits, you must use this name as the required argument of a MODELBAND statement. See the example in Example Program and Statement Details.
CLM= "name"
produces confidence limits for a mean predicted value for each observation. The confidence level is set by the ALPHA= option.
Default: no default
Interaction: name is a unique name within the template that is case sensitive and cannot contain spaces. It must be assigned in order for the confidence limits to be computed. To display confidence limits, you must use this name as the required argument of a MODELBAND statement. See the example in Example Program and Statement Details.
DEGREE= non-negative-integer
specifies the degree of the polynomial.
Default: 1
DEGREE=1 produces a linear fit, DEGREE=2 produces a quadratic fit, DEGREE=3 produces a cubic fit, and so on.
The value of the DEGREE= d option corresponds to either of the following PROC TRANSREG specifications for the independent variable: SPLINE(X / DEGREE=d) or PBSPLINE(X / DEGREE=d LAMBDA=0).
FREQ= numeric-column
specifies a variable in the input data set that represents the frequency of occurrence of the current observation, essentially treating the data set as if each observation appeared n times, where n is the value of the FREQ variable for the observation. Noninteger values of the FREQ variable are truncated to the largest integer less than the FREQ value. The observation is used in the analysis only if the value of the FREQ variable is greater than or equal to 1.
Default: no default
MAXPOINTS= positive-integer
specifies the maximum number of predicted points generated for the regression curve as well as any confidence limits.
Default: 201
WEIGHT= numeric-column
specifies a variable in the input data set that contains values to be used as a priori weights for a regression fit. If an observation’s weight is zero, negative, or missing, the observation is deleted from the analysis.
Default: no default

Options

Statement Option
Description
Specifies the label of the regression line.
Specifies the color and font attributes of the regression line label.
Specifies the location of the regression line label relative to the plot area.
Specifies the position of the regression line label relative to the regression line.
Specifies the degree of the transparency of the regression line.
Creates a distinct set of regression lines from just the observations that correspond to each unique group value of the specified column.
Specifies whether missing values of the group variable are included in the plot.
Specifies indices for mapping line attributes (color and pattern) to one of the GraphData1–GraphDataN style elements.
Specifies a label for a legend.
Specifies the properties of the regression line.
Assigns a name to a plot statement for reference in other template statements.
Specifies that the data columns for this plot be used for determining default axis features.
Specifies display formats for information defined by roles.
Specifies display labels for information defined by roles.
Specifies whether data are mapped to the primary X (bottom) axis or the secondary X2 (top) axis.
Specifies whether data are mapped to the primary Y (left) axis or the secondary Y2 (right) axis.
CURVELABEL="string"
specifies a label for the regression line.
Default: no curve label is displayed
Interaction: If the GROUP= option is specified, this option is ignored.
The font and color attributes for the label are specified by the CURVELABELATTRS= option.
CURVELABELATTRS=style-element | style-element (text-options) | (text-options)
specifies the color and font attributes of the regression line labels. See General Syntax for Attribute Options for the syntax on using a style-element and Text Options for available text-options.
Default: The GraphValueText style element.
Interaction: For this option to take effect, the CURVELABEL= option must also be used.
Interaction: If the GROUP= option is specified, this option is ignored.
CURVELABELLOCATION=INSIDE | OUTSIDE
specifies the location of the regression line label relative to the plot area.
Default: INSIDE
INSIDE
inside the plot area
OUTSIDE
outside the plot area
Restriction: OUTSIDE cannot be used when the REGRESSIONPLOT is used in multi-cell layouts such as LATTICE, DATAPANEL, or DATALATTICE, where axes might be external to the grid.
Interaction: For this option to take effect, the CURVELABEL= option must also be specified.
Interaction: This option is used in conjunction with the CURVELABELPOSITION= option to determine where the line labels appear. For more information, see Location and Position of Curve Labels.
CURVELABELPOSITION=AUTO | MAX | MIN | START | END
specifies the position of the regression line label relative to the regression line.
Default: AUTO when CUVELABELLOCATION=OUTSIDE. END when CURVELABELLOCATION=INSIDE
AUTO
Only used when CURVELABELLOCATION=OUTSIDE. The line label is positioned automatically near the line boundary along unused axes whenever possible (typically Y2 and X2) to avoid collision with tick values.
MAX
Forces the line label to appear near maximum line values (typically, upper right).
MIN
Forces the line label to appear near minimum line values (typically, lower left).
START
Only used when CURVELABELLOCATION=INSIDE. Forces the line label to appear near the beginning of the regression line. Particularly useful when the curve line has a spiral shape.
END
Only used when CURVELABELLOCATION=INSIDE. Forces the line label to appear near the end of the regression line. Particularly useful when the curve line has a spiral shape.
Restriction: The AUTO setting is ignored if CURVELABELLOCATION=INSIDE is specified. The START and END settings are ignored if CURVELABELLOCATION=OUTSIDE is specified.
Interaction: For this option to take effect, the CURVELABEL= option must also be specified.
Interaction: This option is used in conjunction with the CURVELABELLOCATION= option to determine where the line label appears. For more information, see Location and Position of Curve Labels.
DATATRANSPARENCY=number
specifies the degree of the transparency of the regression line.
Default: 0
Range: 0 (opaque) to 1 (entirely transparent)
GROUP=column | discrete-attr-var | expression
creates a distinct set of regression lines from just the observations that correspond to each unique group value of the specified column.
discrete-attr-var
specifies a discrete attribute variable that is defined in a DISCRETEATTRVAR statement.
Restriction: A discrete attribute variable specification must be a direct reference to the attribute variable. It cannot be set by a dynamic variable.
Default: Each distinct group value might be represented in the graph by a different combination of line color and line pattern. Line colors vary according to the GraphData1:ContrastColor–GraphDataN:ContrastColor style references, and line patterns vary according to the GraphData1:LineStyle–GraphDataN:LineStyle style references.
Restriction: The input data must be sorted by the GROUP= column.
Interaction: The group values are mapped in the order of the data, unless the INDEX= option is used to alter the default sequence of line colors and line patterns.
Interaction: The INCLUDEMISSINGGROUP option controls whether missing group values are considered a distinct group value.
Tip: The LINEATTRS= option can be used to override the representations that are used to identify the groups. For example, LINEATTRS=(PATTERN=SOLID) can be used to assign the same pattern to all of the lines, letting the line color distinguish group values. Likewise, LINEATTRS=(COLOR=BLACK) can be used to assign the same color to all of the lines, letting the line pattern distinguish group values.
INCLUDEMISSINGGROUP=boolean
specifies whether missing values of the group variable are included in the plot.
Default: TRUE
Interaction: For this option to take effect, the GROUP= option must also be specified.
Tip: Unless a discrete attribute map is in effect or the INDEX= option is used, the attributes of the missing group value are determined by the GraphMissing style element except when the MISSING= system option is used to specify a non-default missing character or when a user-defined format is applied to the missing group value. In those cases, the attributes of the missing group value are determined by a GraphData1–GraphDataN style element.
INDEX=numeric-column | expression
specifies indices for mapping line attributes (color and pattern) to one of the GraphData1–GraphDataN style elements.
Default: no default
Restriction: If the value of the numeric-column is missing or is less than 1, the observation is not used in the analysis. If the value is not an integer, only the integer portion is used.
Interaction: For this option to take effect, the GROUP= option must also be specified.
Interaction: All of the indexes for a specific group value must be the same. Otherwise, the results are unpredictable.
Interaction: If this option is not used, then the group values are mapped in the order of the data.
Interaction: The index values are 1-based indices. For the style elements in GraphData1–GraphDataN, if the index value is greater than N, then a modulo operation remaps that index value to a number less than N to determine which style element to use.
Discussion: Indexing can be used to collapse the number of groups that are represented in a graph. For more information, see Remapping Groups for Grouped Data.
LEGENDLABEL= "string"
specifies a label for the legend item that is associated with this plot.
Default: The string specified on the NAME= option.
Restriction: This option applies only to an associated DISCRETELEGEND statement.
Interaction: If the GROUP= option is specified, this option is ignored.
LINEATTRS=style-element | style-element (line-options) | (line-options)
specifies the attributes of the regression line. See General Syntax for Attribute Options for the syntax on using a style-element and Line Options for available line-options.
Default: The GraphFit style element.
NAME="string"
assigns a name to a plot statement for reference in other template statements.
Default: no default
Restriction: The string is case sensitive, cannot contain spaces, and must define a unique name within the template.
Interaction: The string is used as the default legend label if the LEGENDLABEL= option is not used.
The specified name is used primarily in legend statements to coordinate the use of colors, marker symbols, and line patterns between the graph and the legend.
PRIMARY=boolean
specifies that the data columns for this plot and the plot type be used for determining default axis features.
Default: FALSE
Restriction: This option is ignored if the plot is placed under a GRIDDED or LATTICE layout block.
Details: This option is needed only when two or more plots within an overlay-type layout contribute to a common axis. For more information, see When Plots Share Data and a Common Axis.
TIPFORMAT=(role-format-list)
specifies display formats for tip columns.
Default: The column format of the variable assigned to the role or BEST6. if no format is assigned to a numeric column.
(role-format-list)
a list of role-name = format pairs separated by blanks.
TIPFORMAT=(Y=6.2)
Requirement: To generate tooltips, you must include an ODS GRAPHICS ON statement that has the IMAGEMAP option specified, and write the graphs to the ODS HTML destination.
The columns assigned to the roles X, Y, and GROUP (if assigned) are automatically included in the tooltip information.
TIPLABEL=(role-label-list)
specifies display labels for tip columns.
Default: The column label or column name of the variable assigned to the role.
(role-label-list)
a list of role-name = "string" pairs separated by blanks.
   TIPLABEL=(Y="Regression Fit")
Requirement: To generate tooltips, you must include an ODS GRAPHICS ON statement that has the IMAGEMAP option specified, and write the graphs to the ODS HTML destination.
The columns assigned to the roles X, Y, and GROUP (if assigned) are automatically included in the tooltip information.
XAXIS=X | X2
specifies whether data are mapped to the primary X (bottom) axis or to the secondary X2 (top) axis.
Default: X
Interaction: The overall plot specification and the layout type determine the axis display. For more information, see How Axis Features Are Determined.
YAXIS=Y | Y2
specifies whether data are mapped to the primary Y (left) axis or to the secondary Y2 (right) axis.
Default: Y
Interaction: The overall plot specification and the layout type determine the axis display. For more information, see How Axis Features Are Determined.