Categories of Statements

Overview

GTL statements generally fall into two main categories:
  • Plot, Legend, and Text statements that determine what items are drawn in the graph.
  • Layout statements that determine how or where the items in the graphs are placed.

Plot Statements—Terminology and Concepts

Overview

GTL has numerous plot statements that can be combined with one another in many different ways. In future releases of GTL, new layout and plot statements will be added to supplement those now available. GTL has been designed as a high-level toolkit that enables you to create a large variety of graphs by combining its constructs in different ways. As you might imagine, not all combinations of statements are possible, and most of the invalid combinations are caught during template compilation. Rather than trying to create graphs by trial and error, it is recommended that you understand a few basic "rules of assembly" to guide your efforts and make the language easier to work with. To that end, some new terminology is useful.

Plot Terminology

Computed Plots
Computed plots internally perform computational transformations on the input data and, as necessary, add new columns to a data object in order to render the requested plot. For example, a LOESSPLOT requires two numeric columns of raw input data (X=column and Y=column). A loess fit line is computed for these input point pairs, a new set of points on a fit line is generated, and a new column that contains the computed points is added to the data object. A smoothed line is drawn through the computed points. Most computed plots have several options to control the computation performed. Another form of computed plot is one with user-defined data transformations. For example, you can use an EVAL( ) function to compute a new column such as Y= eval(log10(column)). This transforms column values into corresponding logarithmic values. Why is it important to know whether a plot is computed? Certain layouts such as PROTOTYPE currently do not allow computed plots to be included.
Parameterized Plots
Parameterized plots simply render the input data they are given. They are useful whenever you have input data that does not need to be preprocessed or that has already been summarized (possibly an output data set from a procedure like PROC FREQ). For example, BARCHARTPARM draws one bar per input observation: the X= column provides the bar tick value and the Y=column provides the bar length. So a bar chart with five bars requires a data set with five observations and two variables. A parameterized bar chart statement is useful when the computed BARCHART statement does not perform the type of computation you want, and you have done the summarization yourself. Many parameterized plots have a "PARM" suffix added to their name. Another common situation is when you want to draw a fit line and a confidence band from a set of data that already has the appropriate set of (X,Y) point coordinates. For these situations you would use a SERIESPLOT statement for the fit line and a BANDPLOT statement for the confidence band. Why is it important to know whether a plot is parameterized? Parameterized plots ensure that no additional computation will take place on the input data. Thus, input data that does not meet the special requirements on the parameterized plot might result in bad output or a blank graph.
Stand-alone Plots
A stand-alone plot is one that can be drawn without any other accompanying plot. In general, a plot is stand-alone if its input data defines a range of values for all axes that are needed to display the plot. For example, the observations plotted in a SCATTERPLOT normally span a certain data range in both X and Y axes. This information is necessary to successfully draw the axes and the markers. Why is it important to know which plots are stand-alone? Because most layouts need to know the extents of the X and Y axis to draw the plot.
Dependent Plots
A dependent plot is one that, by itself, does not provide enough information for the axes that are needed to successfully draw the plot. For example, the REFERENCELINE statement draws a straight line perpendicular to one axis at a given input point on the same axis. Because there is only one point provided, there is not enough information to determine the full range of data for this axis. Furthermore, no information is provided for the data range of the second axis. Thus, a REFERENCELINE statement does not provide enough information by itself to draw the axes and the plot. Such a plot needs to work with another "Stand-alone" plot, which will provide the necessary information to determine the data extents of the two axes.
Primary Plot
When you overlay two or more plots, the layout container determines the type of axis to use, the data range of all axes, and the default format and label to use for each axis. By default, the first encountered stand-alone plot is used to decide the axis type and axis format and label. In some cases, you desire a certain overlay stacking and must order your statements accordingly. This might result in undesirable axis properties. By adding the PRIMARY=TRUE option to a stand-alone plot, you can request that this plot be used to determine axis type and axis format and label. A dependent plot cannot be designated as primary.
Graphics Types
GTL supports both 2D and 3D graphics. Currently there are only two 3D plot statements (SURFACEPLOTPARM and BIHISTOGRAM3DPARM). 3D plot statements must be used in a 3D layout. 2D plot statements cannot be used in a 3D layout, and 3D plot statements cannot be used in a 2D layout. For more information on layouts, see Layout Containers.

Plot Statements Categorized by Type

Plot statements are generally categorized as stand-alone or dependent, computed or parameterized, and 2D or 3D. The following tables show the distribution of plots in these categories.
Stand-alone, 2D, Computed Plots
2D PLOTS: COMPUTED
Statement
Required Arguments
Comments
BARCHART
One column
Horizontal or vertical.
BOXPLOT
One numeric-column
Horizontal or vertical.
HISTOGRAM
One numeric-column
Horizontal or vertical.
DENSITYPLOT
One numeric-column
Theoretical distribution curve (for example, NORMAL or KDE).
REGRESSIONPLOT
Two numeric-columns
Fit plot using linear, quadratic, or cubic regression.
LOESSPLOT
Two numeric-columns
Fit plot using loess.
PBSPLINEPLOT
Two numeric-columns
Fit plot using Penalized B-spline.
ELLIPSE
Two numeric-columns
Confidence or prediction ellipse for a set of points.
SCATTERPLOTMATRIX
Two or more numeric-columns
Grid of scatter plots. Might include computed ellipses, histograms, density curves.
Stand-alone, 2D, Parameterized Plots
2D PLOTS: NONCOMPUTED / PARAMETERIZED
Statement
Required Arguments
Comments
BANDPLOT
Three columns, at least two numeric limits
Area bounded by two straight or curved lines.
BARCHARTPARM
Two columns, Y must be numeric
Horizontal or vertical. Summarized data provided by user.
BLOCKPLOT
Two columns
Strip of X- axis aligned rectangular blocks containing text. The X data must be sorted.
BOXPLOTPARM
One numeric-column and one string-column
Horizontal or vertical. Needs special data format.
CONTOURPLOTPARM
Three numeric-columns
Draws contour plot from pre-gridded data. Basic "gridding" feature is provided using an option.
ELLIPSEPARM
Five numbers or numeric-columns
Draws ellipse given center, slope, semi-major and semi-minor axis lengths.
FRINGEPLOT
One numeric-column
Draws a short line segment of equal length along the X or X2 axis for each observation's X value.
HISTOGRAMPARM
Two numeric-columns
Horizontal or vertical. The Y data must be non-negative.
NEEDLEPLOT
Two columns, Y must be numeric
Draws parallel, vertical line segments connecting data points to a baseline.
SCATTERPLOT
Two columns
Draws markers at data point locations.
SERIESPLOT
Two columns
Draws line segments to connect a set of data points.
STEPPLOT
Two columns, Y must be numeric
Draws stepped line segments to connect a set of data points.
VECTORPLOT
At least two and up to four numeric-columns, X and Y origins can be numeric constants.
Creates directed line segment(s) based on pairs of data points.
Stand-alone, 3D, Parameterized Plots
3D PLOTS: NONCOMPUTED / PARAMETERIZED
Statement
Required Arguments
Comments
SURFACEPLOTPARM
Three numeric-columns
Smooth surface.
BIHISTOGRAM3DPARM
Three numeric-columns
Bivariate histogram. The Z data must be non-negative.
Dependent Plots
Statement
Required Arguments
Comments
MODELBAND
CLM or CLI name of associated fit plot
Confidence bands. Used only in conjunction with a fit plot.
DROPLINE
(X,Y) point location, two columns, or one value and one column
Draws a perpendicular line from a data point to a specified axis.
LINEPARM
(X,Y) point location and slope. The three values can be provided in any combination of number and numeric-column
Draws line(s) given a data point and the slope of the line.
REFERENCELINE
X or Y location, column
Draws line(s) perpendicular to an axis.

Plot Concepts

To illustrate the use of the different types of plot statements, consider the following template. In this template, named MODELFIT, a SCATTERPLOT is overlaid with a REGRESSIONPLOT. The REGRESSIONPLOT is a computed plot because it takes the input columns (HEIGHT and WEIGHT) and transforms them into two new columns that correspond to points on the requested fit line. By default, a linear regression (DEGREE=1) is performed with other statistical defaults. The model in this case is WEIGHT=HEIGHT, which in the plot statement is specified with X=HEIGHT (independent variable) and Y=WEIGHT (dependent variable). The number of observations generated for the fit line is around 200 by default.
Note: Plot statements have to be used in conjunction with Layout statements. To simplify our discussion, we will continue using the most basic layout statement: LAYOUT OVERLAY. This layout statement acts as a single container for all plot statements placed within it. Every plot is drawn on top of the previous one in the order that the plot statements are specified, with the last one drawn on top.
proc template;
  define statgraph modelfit;
    begingraph;
	entrytitle "Regression Fit Plot";
      layout overlay;
        scatterplot x=height y=weight /
                    primary=true;
        regressionplot x=height y=weight;
      endlayout;
    endgraph;
  end;
run;

proc sgrender data=sashelp.class 
              template=modelfit;
run;
Overlaying a REGRESSIONPLOT on a SCATTERPLOT
The REGRESSIONPLOT statement can also generate sets of points for the upper and lower confidence limits of the mean (CLM), and for the upper and lower confidence limits of individual predicted values (CLI) for each observation. The CLM="name" and CLI="name" options cause the extra computation. However, the confidence limits are not displayed by the regression plot. Instead, you must use the dependent plot statement MODELBAND, with the unique name as its required argument. Notice that the MODELBAND statement appears first in the template, ensuring that the band will appear behind the scatter points and fit line. A MODELBAND statement must be used in conjunction with a REGRESSIONPLOT, LOESSPLOT, or PBSPLINEPLOT statement.
layout overlay;
  modelband "myclm" ;
  scatterplot x=height y=weight /
    primary=true;
  regressionplot x=height y=weight /
    alpha=.01 clm="myclm" ;
endlayout;
Adding a MODELBAND Statement to the Layout
This is certainly the easiest way to construct this type of plot. However, you might want to construct a similar plot from an analysis by a statistical procedure that has many more options for controlling the fit. Most procedures create output data sets that can be used directly to create the plot you want. Here is an example of using non-computed, stand-alone plots to build the fit plot. First choose a procedure to do the analysis.
proc reg data=sashelp.class noprint;
  model weight=height / alpha=.01;
  output out=predict predicted=p lclm=lclm uclm=uclm;
run; quit;
The output data set, PREDICT, contains all the variables and observations in SASHELP.CLASS plus, for each observation, the computed variables P, LCLM, and UCLM.
Output Data from PROC REG
Now the template can use simple, non-computed SERIESPLOT and BANDPLOT statements for the presentation of fit line and confidence bands.
proc template;
  define statgraph fit;
    begingraph;
      entrytitle "Regression Fit Plot";
      layout overlay;
        bandplot x=height
          limitupper=uclm
          limitlower=lclm /
          fillattrs=GraphConfidence;
        scatterplot x=height y=weight /
          primary=true;
        seriesplot x=height y=p /
          lineattrs=GraphFit;
      endlayout;
    endgraph;
  end;
run;

proc sgrender data=predict template=fit;
run;
Using a Non-computed SERIESPLOT and BANDPLOT

Legend Statements

GTL supports two types of legends: a discrete legend that is used to identify graphical features such as grouped markers, lines, or overlaid plots; and a continuous legend that shows the range of numeric variation as a ramp of color values. Legend statements are dependent on one or more plot statements and must be associated with the plot(s) that they describe. The basic strategy for creating legends is to "link" the plot statement(s) to a legend statement by assigning a unique, case-sensitive name to the plot statement on its NAME= option and then referencing that name on the legend statement.
Statement
Required Arguments
Comments
DISCRETELEGEND
Name(s) of associated plot(s)
Traditional legend with entries for grouped markers/lines or overlaid plots.
CONTINUOUSLEGEND
Name of an associated plot
Shows a numeric scales with a color ramp. Used in conjunction with contours, surfaces, and scatter plots.
layout overlay;
  modelband "clm";
  scatterplot x=height y=weight /
    primary=true
    group=sex name="s" ; /* the name is case-sensitive */
  regressionplot x=height y=weight /
    alpha=.01 clm="clm";
  discretelegend "s" ;   /* case must match the case on NAME= */
endlayout;
Adding a Legend to the Graph
For more information, seeAdding Legends to a Graph.

Text Statements

GTL supports statements that add text to predefined locations of the graph. SAS Title and Footnotes statements do not contribute to the graph. However, there are comparable ENTRYTITLE and ENTRYFOOTNOTE statements. Like Title and Footnote statements, multiple instances of these statements can be used to create multi-line text.
Statement
Required Arguments
Comments
ENTRYTITLE
String
Text to appear above graph. The ENTRYTITLE statement is specified inside the BEGINGRAPH block but outside of the outermost layout.
ENTRYFOOTNOTE
String
Text to appear below graph. The ENTRYFOOTNOTE statement is specified inside the BEGINGRAPH block but outside of the outermost layout.
ENTRY
String
Text to appear within graph. The ENTRY statement is specified inside a layout block.
layout overlay;
  modelband "clm";
  scatterplot x=height y=weight /
    primary=true;
  regressionplot x=height y=weight /
    alpha=.05 clm="clm";
  entry "Band shows 95% CLM" /
    autoalign=auto;
endlayout;
Using Text in a Graph
For more information, seeAdding and Changing Text in a Graph.

Layout Containers

Layout statements, a key feature of the GTL, form "containers" that determine how the plots, legends and texts items are drawn in the graph. GTL supports many different layout statements that are suitable for different usage. However, these fall into two main categories.
  • Single-cell layout statements place the plots, legends, and entries in a common region. The statements that are placed within these "overlay" containers are processed in order. Each plot is drawn on top of the previous plot, with the last one drawn on top.
  • Multi-cell layout statements partition the graph region into multiple smaller "cells." Each cell can be populated by an individual plot, an overlay, or a nested multi-cell layout. The layout of the "cells" is determined by the user, or by classification variables.
Layout blocks always begin with the LAYOUT keyword followed by a keyword indicating the purpose of the layout. All layout blocks end with an ENDLAYOUT statement. The following table summarizes the available layouts.
Layout (Description)
Graphics Allowed and Cells Produced
Comments
Example
OVERLAY
(Single Cell)
2D
One cell
General purpose layout for superimposing 2D plots
One-cell Layout
OVERLAYEQUATED
(Single Cell)
2D
One cell
Specialized OVERLAY with equated axes
PROTOTYPE
(Single Cell)
2D
One cell
Specialized LAYOUT used only as child layout of DATAPANEL or DATALATTICE
OVERLAY3D
(Single Cell)
3D
One cell
General purpose 3D layout for superimposing 3D plots.
Single-Cell 3D Layout
LATTICE
(Advanced Multi-cell)
2D or 3D
One or more cells
All cells must be predefined. Axes can be shared across columns or rows, and they can be external to the grid. Many grid labeling and alignment features.
Multi-cell Layout
GRIDDED
(Simple Multi-cell)
2D or 3D
One or more cells
All cells must be predefined. Axes independent for each cell. Very simple multi-cell container.
DATAPANEL
(Classification Panel)
2D
One or more cells
Displays a panel of similar graphs based on data subsetted by classification variable(s). Number of cells is based on crossings of n classification variable(s).
Classification Panel
DATALATTICE
(Classification Panel)
2D
One or more cells
Displays a panel of similar graphs based on data subsetted by classification variable(s). Number of cells is based on crossings of one or two classification variables.
To learn more about layouts, refer to the appropriate chapter: