PLOT Procedure

Concepts: PLOT Procedure

RUN Groups

PROC PLOT is an interactive procedure. It remains active after a RUN statement is executed. Usually, SAS terminates a procedure after executing a RUN statement. When you start the PLOT procedure, you can continue to submit any valid statements without resubmitting the PROC PLOT statement. Thus, you can easily experiment with changing labels, values of tick marks, and so on. Any options submitted in the PROC PLOT statement remain in effect until you submit another PROC PLOT statement.
When you submit a RUN statement, PROC PLOT executes all the statements submitted since the last PROC PLOT or RUN statement. Each group of statements is called a RUN group. With each RUN group, PROC PLOT begins a new page and begins with the first item in the VPERCENT= and HPERCENT= lists, if any.
To terminate the procedure, submit a QUIT statement, a DATA statement, or a PROC statement. Like the RUN statement, each of these statements completes a RUN group. If you do not want to execute the statements in the RUN group, then use the RUN CANCEL statement, which terminates the procedure immediately.
You can use the BY statement interactively. The BY statement remains in effect until you submit another BY statement or terminate the procedure.
See Adjusting Labels on a Plot with the PLACEMENT= Option for an example of using RUN-group processing with PROC PLOT.

Generating Data with Program Statements

When you generate data to be plotted, a good rule is to generate fewer observations than the number of positions on the horizontal axis. PROC PLOT then uses the increment of the horizontal variable as the interval between tick marks.
Because PROC PLOT prints one character for each observation, using SAS program statements to generate the data set for PROC PLOT can enhance the effectiveness of continuous plots. For example, suppose that you want to generate data in order to plot the following equation, for x ranging from 0 to 100:
You can submit these statements:
  options linesize=80;
   data generate;
      do x=0 to 100 by 2;
         y=2.54+3.83*x;
         output;
      end;
   run;
   proc plot data=generate;
      plot y*x;
   run;
If the plot is printed with a LINESIZE= value of 80, then about 75 positions are available on the horizontal axis for the X values. Thus, 2 is a good increment: 51 observations are generated, which is fewer than the 75 available positions on the horizontal axis.
However, if the plot is printed with a LINESIZE= value of 132, then an increment of 2 produces a plot in which the plotting symbols have space between them. For a smoother line, a better increment is 1, because 101 observations are generated.

Labeling Plot Points with Values of a Variable

Pointer Symbols

When you are using a label variable and do not specify a plotting symbol or if the value of the variable that you use as the plotting symbol is null ('00'x), PROC PLOT uses pointer symbols as plotting symbols. Pointer symbols associate a point with its label by pointing in the general direction of the label placement. PROC PLOT uses four different pointer symbols based on the value of the S= and V= suboptions in the PLACEMENT= option. The table below shows the pointer symbols:
Pointer Symbols
S=
V=
Symbol
LEFT
any
<
RIGHT
any
>
CENTER
>0
CENTER
<=0
v
If you are using pointer symbols and multiple points coincide, then PROC PLOT uses the number of points as the plotting symbol if the number of points is between 2 and 9. If the number of points is more than 9, then the procedure uses an asterisk (*).
Note: Because of character set differences among operating environments, the pointer symbol for S=CENTER and V>0 might differ from the one shown here.

Understanding Penalties

PROC PLOT assesses the quality of placements with penalties. If all labels are plotted with zero penalty, then no labels collide and all labels are near their symbols. When it is not possible to place all labels with zero penalty, PROC PLOT tries to minimize the total penalty. The following table gives a description of the penalty, the default value of the penalty, the index that you use to reference the penalty, and the range of values that you can specify if you change the penalties. Each penalty is described in more detail in Index Values for Penalties.
Penalties Table
Penalty
Default Penalty
Index
Range
Not placing a blank
1
1
0-500
Bad split, no split character specified
1
2
0-500
Bad split with split character
50
3
0-500
Free horizontal shift, fhs
2
4
0-500
Free vertical shift, fvs
1
5
0-500
Vertical shift weight, vsw
2
6
0-500
Vertical or horizontal shift denominator, vhsd
5
7
1-500
Collision state
500
8
0-10,000
(Reserved for future use)
9-14
Not placing the first character
11
15
0-500
Not placing the second character
10
16
0-500
Not placing the third character
8
17
0-500
Not placing the fourth character
5
18
0-500
Not placing the fifth through 200th character
2
19-214
0-500
The following table contains the index values from the previous table with a description of the corresponding penalty.
Index Values for Penalties
1
A nonblank character in the plot collides with an embedded blank in a label, or there is not a blank or a plot boundary before or after each label fragment.
2
A split occurs on a nonblank or nonpunctuation character when you do not specify a split character.
3
A label is placed with a different number of lines than the L= suboption specifies, when you specify a split character.
4-7
A label is placed far away from the corresponding point. PROC PLOT calculates the penalty according to this (integer arithmetic) formula:
Notice that penalties 4 through 7 are actually just components of the formula used to determine the penalty. Changing the penalty for a free horizontal or free vertical shift to a large value such as 500 removes any penalty for a large horizontal or vertical shift. Plotting Date Values on an Axis illustrates a case in which removing the horizontal shift penalty is useful.
8
A label might collide with its own plotting symbol. If the plotting symbol is blank, then a collision state cannot occur. See Collision States for more information.
15-214
A label character does not appear in the plot. By default, the penalty for not printing the first character is greater than the penalty for not printing the second character, and so on. By default, the penalty for not printing the fifth and subsequent characters is the same.
Note: Labels can share characters without penalty.

Changing Penalties

You can change the default penalties with the PENALTIES= option in the PLOT statement. Because PROC PLOT considers penalties when it places labels, changing the default penalties can change the placement of the labels. For example, if you have labels that all begin with the same two-letter prefix, then you might want to increase the default penalty for not printing the third, fourth, and fifth characters and decrease the penalties for not printing the first and second characters . In the following example, the PENALTIES= option increases the default penalty for not printing the third, fourth, and fifth characters to 11, 10, and 8 and it decreases the penalties for not printing the first and second characters to 2
   penalties(15 to 20)=2 2 11 10 8 2
This example extends the penalty list. The 20th penalty of 2 is the penalty for not printing the sixth through 200th character. When the last index i is greater than 18, the last penalty is used for the (i − 14)th character and beyond.
You can also extend the penalty list by just specifying the starting index. For example, the following PENALTIES= option is equivalent to the one above:
penalties(15)=2 2 11 10 8 2

Collision States

Collision states are placement states that can cause a label to collide with its own plotting symbol. PROC PLOT usually avoids using collision states because of the large default penalty of 500 that is associated with them. PROC PLOT does not consider the actual length or splitting of any particular label when determining if a placement state is a collision state. The following are the rules that PROC PLOT uses to determine collision states:
  • When S=CENTER, placement states that do not shift the label up or down sufficiently so that all of the label is shifted onto completely different lines from the symbol are collision states.
  • When S=RIGHT, placement states that shift the label zero or more positions to the left without first shifting the label up or down onto completely different lines from the symbol are collision states.
  • When S=LEFT, placement states that shift the label zero or more positions to the right without first shifting the label up or down onto completely different lines from the symbol are collision states.
Note: A collision state cannot occur if you do not use a plotting symbol.

Reference Lines

PROC PLOT places labels and computes penalties before placing reference lines on a plot. The procedure does not attempt to avoid rows and columns that contain reference lines.

Hidden Label Characters

In addition to the number of hidden observations and hidden plotting symbols, PROC PLOT prints the number of hidden label characters. Label characters can be hidden by plotting symbols or other label characters.

Overlaying Label Plots

When you overlay a label plot and a nonlabel plot, PROC PLOT tries to avoid collisions between the labels and the characters of the nonlabel plot. When a label character collides with a character in a nonlabel plot, PROC PLOT adds the usual penalty to the penalty sum.
When you overlay two or more label plots, all label plots are treated as a single plot in avoiding collisions and computing hidden character counts. Labels of different plots never overprint, even with the OVP system option in effect.

Computational Resources Used for Label Plots

This section uses the following variables to discuss how much time and memory PROC PLOT uses to construct label plots:
n
number of points with labels.
len
constant length of labels.
s
number of label pieces, or fragments.
p
number of placement states specified in the PLACE= option.

Time

For a given plot size, the time that is required to construct the plot is approximately proportional to . The amount of time required to split the labels is approximately proportional to . Generally, the more placement states that you specify, the more time that PROC PLOT needs to place the labels. However, increasing the number of horizontal and vertical shifts gives PROC PLOT more flexibility to avoid collisions, often resulting in less time used to place labels.

Memory

PROC PLOT uses 24p bytes of memory for the internal placement state list. PROC PLOT uses bytes for the internal list of labels. PROC PLOT builds all plots in memory; each printing position uses one byte of memory. If you run out of memory, then request fewer plots in each PLOT statement and put a RUN statement after each PLOT statement.

Specifying Variable Lists in Plot Requests

You can use SAS variable lists in plot requests. For example, the following are valid plot requests:
Plot Requests
Plot Request
What is Plotted
(a - - d)
a*b a*c a*d b*c b*d
c*d
(x1 - x4)
x1*x2
x1*x3 x1*x4 x2*x3
x2*x4 x3*x4
(_numeric_)
All combinations of numeric variables
y*(x1 - x4)
y*x1
y*x2 y*x4 y*x4
If both the vertical and horizontal specifications request more than one variable and if a variable appears in both lists, then it will not be plotted against itself. For example, the following statement does not plot B*B and C*C:
plot (a b c)*(b c d);

Specifying Combinations of Variables

The operator in request is either an asterisk (*) or a colon (:). An asterisk combines the variables in the lists to produce all possible combinations of x and y variables. For example, the following plot requests are equivalent:
plot (y1-y2) * (x1-x2);

plot y1*x1 y1*x2 y2*x1 y2*x2;
A colon combines the variables pairwise. Thus, the first variables of each list combine to request a plot, as do the second, third, and so on. For example, the following plot requests are equivalent:
plot (y1-y2) : (x1-x2);

plot y1*x1 y2*x2;