| TEMPLATE Procedure: Plot Statements (Experimental) |
| Types of Plot Statements |
Plot statements are classified in two ways:
type of input data
typical usage in an overlaid display
Depending on the type of input data, there are two types of plot statements:
plot statements that use raw data
For example, the HISTOGRAM statement computes and displays a histogram from raw data.
plot statements that use summary statistics or parameters
For example, the ELLIPSEPARM statement displays an ellipse based on specified slope and axes parameters.
In general, plot statements with the PARM suffix use summary statistics or parameters. Plot statements without the PARM suffix compute summary statistics or parameters from raw input data.
Depending on the typical usage in an overlaid display, there are two types of plots statements:
plot statements that create a basic plot with axes
plot statements that add to or enhance a basic plot
For example, a histogram overlaid with a normal density curve is created with a HISTOGRAM statement and a DENSITY statement inside a LAYOUT OVERLAY block. The HISTOGRAM statement creates the histogram and determines the axes labels and tick marks. The DENSITY statement adds a normal density estimate to the plot.
For example, a scatter plot with axes is created with a SCATTERPLOT statement inside a LAYOUT OVERLAY block. A regression line and confidence bands can be added to the scatter plot with a BAND statement and a LINEPARM statement. The axes labels and the location of the tick marks of the overlaid display are determined by the variables in the SCATTERPLOT statement.
In general, plot statements with the suffixes CHART, GRAM, or PLOT are used to create a basic plot with axes. Plot statements without these suffixes are typically used to add to or enhance a basic plot.
The following table illustrates the different types of plot statements available in the graph templates, according to type of input data, and typical usage in an overlaid display.
| Plot Statement | Type of Input Data | Typical Usage | ||
|---|---|---|---|---|
|
|
Raw Data | Summary Statistics or Parameters | Create Basic Plot | Add to or Enhance Plot |
| BAND | X |
|
|
X |
| BANDPLOT | X |
|
X |
|
| BARCHARTPARM |
|
X | X |
|
| BARPARM | X |
|
|
X |
| BIHISTOGRAMPARM |
|
X | X |
|
| BIHISTOPARM |
|
X |
|
X |
| BOX | X |
|
|
X |
| BOXPARM |
|
X |
|
X |
| BOXPLOT | X |
|
X |
|
| BOXPLOTPARM |
|
X | X |
|
| CONTOURPARM |
|
X |
|
X |
| CONTOURPLOTPARM |
|
X | X |
|
| DENSITY | X |
|
|
X |
| DENSITYPLOT | X |
|
X |
|
| ELLIPSE | X |
|
|
X |
| ELLIPSEPARM |
|
X |
|
X |
| FRINGE | X |
|
|
X |
| HISTO | X |
|
|
X |
| HISTOGRAM | X |
|
X |
|
| HISTOGRAMPARM |
|
X | X |
|
| HISTOPARM |
|
X |
|
X |
| LINEPARM |
|
X |
|
X |
| NEEDLE | X |
|
|
X |
| NEEDLEPLOT | X |
|
X |
|
| SCATTER | X |
|
|
X |
| SCATTERPLOT | X |
|
X |
|
| SCATTERPLOTMATRIX | X |
|
X |
|
| SERIES | X |
|
|
X |
| SERIESPLOT | X |
|
X |
|
| STEP | X |
|
|
X |
| STEPPLOT | X |
|
X |
|
| SURFACE |
|
X |
|
X |
| SURFACEPLOT |
|
X | X |
|
| VECTOR | X |
|
|
X |
| VECTORPLOT | X |
|
X |
|
| Ignored Plot Options in a LAYOUT OVERLAY Block |
A LAYOUT OVERLAY block enables you to overlay multiple plots within a single layout area. For example, the following figure shows a scatter plot and a line that have been specified within a LAYOUT OVERLAY block.
Scatter Plot Overlaid with a Line
![[Scatter Plot Overlaid with a Line]](./images/scatterplotline.gif)
All plots within a LAYOUT OVERLAY block share the axes and other graph features that are common within the layout area, such as the graph background and wall.
To prevent multiple plots from specifying conflicting values within a LAYOUT OVERLAY block, the LAYOUT OVERLAY statement controls the graph features that are common within the layout area. If individual plot statements specify values for any of these features, the values are ignored.
For example, in the following LAYOUT OVERLAY block, the SCATTERPLOT statement uses the BACKGROUND= option to specify a blue background.
LAYOUT OVERLAY; SCATTERPLOT Y=Height X=Weight / BACKGROUND=BLUE; LINEPARM YINTERCEPT=Intercept SLOPE=Slope; ENDLAYOUT;
Because the background color is shared by all plots within the layout area, the BACKGROUND= value on the SCATTERPLOT statement is ignored, and the LAYOUT OVERLAY statement controls the background color. In this example, the LAYOUT OVERLAY statement does not explicitly specify a background color, so the default background color is used.
Although the LAYOUT OVERLAY statement controls the common graph features, the individual plot statements retain control of the features that are specifically related to their own data. For example, in the following LAYOUT OVERLAY block, the LINEPARM statement uses the DATATRANSPARENCY= option to specify a level of transparency for the line.
LAYOUT OVERLAY; SCATTERPLOT Y=Weight X=Height; LINEPARM YINTERCEPT=Intercept SLOPE=Slope / DATATRANSPARENCY=0.5; ENDLAYOUT;
Because the line transparency value does not affect any of the common graph features that the LINEPARM shares with the SCATTERPLOT, the DATATRANSPARENCY= value on the LINEPARM statement is honored in the graph.
A more complicated case within a LAYOUT OVERLAY block is the use of the TRANSPARENCY= option. For most plots, the TRANSPARENCY= option specifies both the common features and the features that are specific to the plot. For example, the following SCATTERPLOT statement specifies a transparency value of 0.2.
LAYOUT OVERLAY; SCATTERPLOT Y=Weight X=Height / TRANSPARENCY=0.2; LINEPARM YINTERCEPT=Intercept SLOPE=Slope; ENDLAYOUT;
On the SCATTERPLOT statement, the TRANSPARENCY= option specifies the level of transparency for the scatter plot markers, marker labels, background, grid lines, wall, and axis features. However, within a LAYOUT OVERLAY block, the background, grid lines, wall, and axis features are common features that are controlled by the LAYOUT OVERLAY statement, so the TRANSPARENCY= value specified by the SCATTERPLOT statement will be ignored for these common features. Nevertheless, the TRANSPARENCY= value is honored for the scatter plot markers and their labels (when present) because these features of the data elements are specific to the scatter plot.
For information about which statement options are ignored within a LAYOUT OVERLAY block, see the reference documentation for each plot statement.
| BOXPARM and BOXPLOTPARM Statement Box-and-Whisker Plots |
The BOXPARM and BOXPLOTPARM statements produce box-and-whisker plots. The bottom and top edges of the box are located at the 25th and 75th percentiles of the sample. A horizontal line within the box can be drawn at the 50th percentile (median), and the mean can also be displayed. The plot can use plot symbols to mark outlier observations that are more extreme than upper and lower fences, which are not visible in the plot but are located at 1.5 interquartile ranges above and below the box. (An interquartile range is the distance between the 25th and the 75th sample percentiles.) Far outliers that are beyond upper and lower far fences (3 interquartile ranges) can also be identified. Finally, the plot can display horizontal whiskers, which are lines that are drawn from the box to the minimum and maximum data values that do not extend beyond the fence values.
The following figure illustrates the box plot elements:
![[untitled graphic]](./images/boxplotparmfences.gif)
At a minimum, valid data for the BOXPARM and BOXPLOTPARM statements must provide a numeric column (Y=) that contains calculated statistics for an analysis, and a string column (STAT=) that identifies each statistic. The Y column must contain non-missing values for the Q1 (25th percentile) and Q3 (75th percentile) statistics. If Y values are missing or not supplied for other statistic values, then those statistics are not displayed in the plot.
For example, a petroleum company uses a turbine to heat water into steam that is pumped into the ground to make oil more viscous and easier to extract. This process occurs 20 times daily, and the amount of power (in kilowatts) used to heat the water to the desired temperature is recorded. The following data shows the statistics that are calculated for one day of this process:
![[untitled graphic]](./images/boxplotparmtable1.gif)
To plot the data from the above table, the following BOXPARM statement uses the Y= and STAT= arguments to generate a single box plot for the recorded statistics:
BOXPARM Y=PowerOutputs STAT=Statistic;
![[untitled graphic]](./images/boxplotparmcode1.gif)
If the data contains statistics for multiple days of the process, a third column in the data must be present to identify the days that the statistics were recorded. For example, the following data shows the statistics that are calculated for two days of this process:
![[untitled graphic]](./images/boxplotparmtable2.gif)
To plot the data from the above table, the BOXPARM statement needs the Y=, STAT=, and X= arguments to generate a separate box plot for each day that the statistics were recorded:
BOXPARM Y=PowerOutputs STAT=Statistic X=Day;
![[untitled graphic]](./images/boxplotparmcode2.gif)
| Remapping the Color and Marker Symbols by Using the INDEX= option |
Indexing can be used to collapse the number of groups that are represented in a graph. For example, if there are 10 groups in the data, indexes 1 and 2 can be assigned to the first two groups, and index 3 can be assigned to all other groups. The third through tenth data groups are treated as a single group in the graph.
Indexing can control the order in which colors and marker symbols are mapped to group values in a graph. This ordering method is only needed for coordinating the data display of multiple graphs when the default mapping would cause group values to be mismatched between graphs.
For example, consider two studies of three drugs, A, B, and C. If Study 1 uses all three drugs, then the first combination of color and marker symbol is mapped to Drug A. The second combination of color and marker symbol is mapped to Drug B, and the third is mapped to Drug C. If Study 2 omits Drug A, then the first combination of color and marker symbol is mapped to Drug B, and the second is mapped to Drug C. If the two graphs are viewed together, then this default mapping causes the group values to be mismatched. The visual attributes that represent Drug A in the first graph represent Drug B in the second graph, and those that represent Drug B in the first graph represent Drug C in the second group.
The GROUP= option mappings can be made consistent between the two graphs by creating an index column for each study. For these example studies, the GROUP and INDEX columns are the following:
| Study 1 | Drug1 | Index1 |
|
|
A | 1 |
|
|
B | 2 |
|
|
B | 2 |
|
|
C | 3 |
| Study 2 | Drug2 | Index2 |
|
|
B | 2 |
|
|
C | 3 |
|
|
C | 3 |
If the graph for Study 1 specifies INDEX=INDEX1 and the graph for Study 2 specifies INDEX=INDEX2, then the second combination of color and marker symbol is mapped to Drug B in both graphs, and the third combination of color and marker symbol is mapped to Drug C in both graphs.
| Changing Line Patterns and their Sequence by Using the LINEPATTERN= Option |
The LINEPATTERN= option can be used with or without the GROUP= option. If you do not specify the GROUP= option, then you can change a single line pattern by using the LINEPATTERN= option in a plot statement.
If you specify the GROUP= option and the GraphData1 - GraphData12 style elements do not set the LineStyle attribute, then the default line pattern sequence is used. The patterns are used in the order as shown in the table below, and the sequence is repeated as many times as necessary to display the lines of the graph.
The following table shows the value of each line pattern and each line style, and the appearance of each line that results from its value.
Line Patterns
![[Line Patterns]](./images/linepattern.gif)
In the following example, the LINEPATTERN= option is not set. The graph has two lines, one for males and one for females. The lines use the default line sequence. The first line id a solid line and the second line id a dotted line. The lines are assigned the first two LinePattern values (from the default sequence), as shown in the above table.
proc template; define statgraph testPattern; layout Overlay; scatterPlot y=height x=weight / group=gender; lineparm yintercept=intercept slope=slope / group=gender; endlayout; end;
In the following example, the LINEPARM statement specifies that LINEPATTERN=DASH. The graph has two lines, but both of them are dashed lines. The change affects both statements in the code, but does not change the line patterns globally.
proc template; define statgraph testPattern; layout Overlay; scatterPlot y=height x=weight / group=gender; lineparm yintercept=intercept slope=slope / group=gender linepattern=dash; endlayout; end;
In the following example, the LINEPATTERN= option is not set, but values for the LineStyle attributes in the GraphData1 and GraphData2 style elements are set. Because the LINEPARM statement does not use the LINEPATTERN= option, line pattern sequence is used for the lines, but the default sequence is overridden by the settings in the LineStyle attributes in the GraphData1 and GraphData2 style elements. The graph has two lines. The first line has long dashes (style value 6) and the second line has short dashes (style value 4). This new line sequence is used as the default line sequence for any plot that uses the MyDefault style.
proc template; define statgraph testPattern; layout Overlay; scatterPlot y=height x=weight / group=gender; lineparm yintercept=intercept slope=slope / group=gender; endlayout; end; define style Styles.MyDefault; parent=Styles.Default; style GraphData1 / LineStyle=6; style GraphData2 / LineStyle=4; end;
| Changing Marker Symbols and Their Sequence |
You can specify a marker symbol for the plot points in a graph by using the MARKERSYMBOL= option. If you do not use the MARKERSYMBOL= option, then the default marker symbol is specified by the MarkerSymbol attribute of the GraphDataDefault style element.
To globally change the default marker symbol to a different default symbol, you change the MarkerSymbol attribute of the GraphDataDefault style element. You can specify any of the marker symbol values that are shown in the Marker Symbols table below.
To change the marker symbol in an individual plot statement, you use the MARKERSYMBOL= option. You can specify any of the marker symbol values shown in the Marker Symbols table below.
If you specify the GROUP= options on a plot statement, then a separate plot is generated for each unique value of the group variable, and a different marker symbol is used in each plot. The default sequence of marker symbols used in the plots is specified by the STANDARD marker set that is shown in the Marker Symbol Sets table below. The symbols in the set are repeated as many times as necessary to provide a marker symbol for each plot.
Marker Symbols
![[Marker Symbols]](./images/markertablestatgraph.gif)
Marker Symbol Sets
![[Marker Symbol Sets]](./images/markersymbolsetstatgraph.gif)
To change the default sequence of marker symbols for grouped plots, you specify the MarkerSymbol attribute in each of the GraphData1 - GraphData12 style elements. For example, if you want to set a sequence of DIAMOND, CROSS, and CIRCLE you perform the following steps:
Assign the value of DIAMOND to the MarkerSymbol attribute in the GraphData1 style element.
Assign the value of CROSS to the MarkerSymbol attribute in the GraphData2 style element.
Assign the value of CIRCLE to the MarkerSymbol attribute in the GraphData3 style element.
style GraphData1 / MARKERSYMBOL=DIAMOND; style GraphData2 / MARKERSYMBOL=CROSS; style GraphData3 / MARKERSYMBOL=CIRCLE;