Statistical Graphics Using ODS |
This example uses artificial data to illustrate two basic principles of template writing: that statement order matters and that one of the plotting statements is the primary statement. The data are a sample from a bivariate normal distribution. A custom graph template and PROC SGRENDER are used to plot the data along with vectors and ellipses. The plot consists of four components: a scatterplot of the data; vectors whose end points come from other variables in the data set; ellipses whose parameters are specified in the template; and reference lines whose locations are specified in the template. Initially, thick lines are used to show what happens at the places where the lines and points intersect.
The following steps create the input SAS data set:
data x; input x y; label x = 'Normal(0, 4)' y = 'Normal(0, 1)'; datalines; -4 0 4 0 0 -2 0 2 ; data y(drop=i); do i = 1 to 2500; r1 = normal( 104 ); r2 = normal( 104 ) * 2; output; end; run; data all; merge x y; run;
The data set All contains four variables. The variables r1 and r2 contain the random data. These variables contain 2500 nonmissing observations. The data set also contains the variables x and y, which contain the end points for the vectors. These variables contain four nonmissing observations and 2496 observations that are all missing. A data set like this is not unusual when creating overlaid plots. Different overlays often require input data with very different sizes.
First, the data are plotted by using a template that is deliberately constructed to demonstrate a number of problems that can occur with statement order. The following steps create Output 21.11.1:
proc template; define statgraph Plot; begingraph; entrytitle 'Statement Order and the PRIMARY= Option'; layout overlayequated / equatetype=fit; ellipseparm semimajor=eval(sqrt(4)) semiminor=1 slope=0 xorigin=0 yorigin=0 / outlineattrs=GraphData2(pattern=solid thickness=5); ellipseparm semimajor=eval(2 * sqrt(4)) semiminor=2 slope=0 xorigin=0 yorigin=0 / outlineattrs=GraphData5(pattern=solid thickness=5); vectorplot y=y x=x xorigin=0 yorigin=0 / arrowheads=false lineattrs=GraphFit(thickness=5); scatterplot y=r1 x=r2 / markerattrs=(symbol=circlefilled size=3); referenceline x=0 / lineattrs=(thickness=3); referenceline y=0 / lineattrs=(thickness=3); endlayout; endgraph; end; run; ods listing style=listing; proc sgrender data=all template=plot; run;
There are a number of problems with the plot in Output 21.11.1. The reference lines obliterate the vectors, and the data are on top of everything but the reference lines. It might be more reasonable to plot the reference lines first, the data next, the vectors next, and the ellipses last. The following steps do this and produce Output 21.11.2:
proc template; define statgraph Plot; begingraph; entrytitle 'Statement Order and the PRIMARY= Option'; layout overlayequated / equatetype=fit; referenceline x=0 / lineattrs=(thickness=3); referenceline y=0 / lineattrs=(thickness=3); scatterplot y=r1 x=r2 / markerattrs=(symbol=circlefilled size=3); vectorplot y=y x=x xorigin=0 yorigin=0 / arrowheads=false lineattrs=GraphFit(thickness=5); ellipseparm semimajor=eval(sqrt(4)) semiminor=1 slope=0 xorigin=0 yorigin=0 / outlineattrs=GraphData2(pattern=solid thickness=5); ellipseparm semimajor=eval(2 * sqrt(4)) semiminor=2 slope=0 xorigin=0 yorigin=0 / outlineattrs=GraphData5(pattern=solid thickness=5); endlayout; endgraph; end; run; ods listing style=listing; proc sgrender data=all template=plot; run;
Output 21.11.2 looks better than Output 21.11.1, but the labels for the axes have changed. Output 21.11.1 has the labels of the variables x and y as axis labels, whereas Output 21.11.2 uses the names of the variables r1 and r2. This is because in the Output 21.11.1, the first plot is the vector plot of x and y (which have labels), and in Output 21.11.2, the first plot is the scatter plot of r1 and r2 (which do not have labels). By default, the first plot is the primary plot, and the primary plot is used to determine the axis type and labels. You can designate the vector plot as the primary plot with the PRIMARY=TRUE option. The following statements make the final plot, this time with default line thicknesses, and produce Output 21.11.3:
proc template; define statgraph Plot; begingraph; entrytitle 'Statement Order and the PRIMARY= Option'; layout overlayequated / equatetype=fit; referenceline x=0; referenceline y=0; scatterplot y=r1 x=r2 / markerattrs=(symbol=circlefilled size=3); vectorplot y=y x=x xorigin=0 yorigin=0 / primary=true arrowheads=false lineattrs=GraphFit; ellipseparm semimajor=eval(sqrt(4)) semiminor=1 slope=0 xorigin=0 yorigin=0 / outlineattrs=GraphData2(pattern=solid); ellipseparm semimajor=eval(2 * sqrt(4)) semiminor=2 slope=0 xorigin=0 yorigin=0 / outlineattrs=GraphData5(pattern=solid); endlayout; endgraph; end; run; ods listing style=listing; proc sgrender data=all template=plot; run;
The axis labels in Output 21.11.3 and the overprinting of plot elements look better than in the previous plots. You can further adjust the line thicknesses if you want to emphasize or deemphasize components of this plot. The following list discusses the syntax of the GTL statements used in this example.
The template has an ENTRYTITLE statement that specifies the title.
The template has an equated overlay. This means that a centimeter on one axis represents the same data range as a centimeter on the other axis. This is done instead of the more common LAYOUT OVERLAY since with these data, the shape and geometry of the data have meaning even though the ranges of the two axis variables are different. The option EQUATETYPE=SQUARE is used to make a square plot, but since the X-axis variable has a larger range than the Y-axis variable, and since the default plot size is wider than high, EQUATETYPE=FIT is specified. The axes are equated but use the available space.
A vertical reference line is drawn at X=0, and a horizontal reference line is drawn at Y=0.
The scatter plot is based on the Y-axis variable r2 and the X-axis variable r1. The markers are filled circles with a size of three pixels. This is smaller than the default size and works well with a plot that displays many points.
The vector plot is based on the Y-axis variable y and the X-axis variable x. The vectors are solid lines with no heads emanating from the origin (X=0 and Y=0). The color and other line attributes such as thickness come from the attributes of the GraphFit style element. This is the primary plot, so the default axis labels are the variable labels for the X= and Y= variables if they exist or the variable names if the variables do not have labels.
The plot also displays two ellipses with X=0 and Y=0 at their center. Their widths are expressions, and their heights are constant. The expressions are not needed in this example; they are used to illustrate the syntax. The SEMIMAJOR= option specifies half the length of the major axis for the ellipse, and the SEMIMINOR= option specifies half the length of the minor axis for the ellipse. The SLOPE= option specifies the slope of the major axis for the ellipse. The colors of the ellipses and other line properties are based on the GraphData2 and GraphData5 style elements, but the line pattern attribute from the style is overridden.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.