The REG Procedure |
Line Printer Scatter Plot Features |
This section discusses the special options available with line printer scatter plots. Detailed examples of traditional graphics and options are given in the section Traditional Graphics.
The interactive PLOT statement available in PROC REG enables you to look at scatter plots of data and diagnostic statistics. These plots can help you to evaluate the model and detect outliers in your data. Several options enable you to place multiple plots on a single page, superimpose plots, and collect plots to be overlaid by later plots. The PAINT statement can be used to highlight points on a plot. See the section Painting Scatter Plots for more information about painting.
The Class data set introduced in the section Simple Linear Regression is used in the following examples.
You can superimpose several plots with the OVERLAY option. With the following statements, a plot of Weight against Height is overlaid with plots of the predicted values and the 95% prediction intervals. The model on which the statistics are based is the full model including Height and Age. These statements produce the plot in Figure 73.34:
proc reg data=Class lineprinter; model Weight=Height Age / noprint; plot (ucl. lcl. p.)*Height='-' Weight*Height / overlay symbol='o'; run;
---+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+---- U U95 | | p 200 + + p | | e | | r | - | | | B 150 + - o + o | - - -- - | u | - - -o o | n | -- - | d | - -- - - o - -- - o o - | 100 + - o ? o o - + o | - o - - o -- | f | ?? ?- o ? - -- - | | | 9 | - - - | 5 50 + o -- -- + % | | | - | C | | . | | I 0 + + . | | ( ---+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+---- I 50 52 54 56 58 60 62 64 66 68 70 72 n Height |
plot; run;
This places each of the four plots on a separate page, while the statements
plot / overlay; run;
repeat the previous overlaid plot. In general, the statement
plot;
is equivalent to respecifying the most recent PLOT statement without any options. However, the COLLECT, HPLOTS=, SYMBOL=, and VPLOTS= options apply across PLOT statements and remain in effect.
The next example shows how you can overlay plots of statistics before and after a change in the model. For the full model involving Height and Age, the ordinary residuals and the studentized residuals are plotted against the predicted values. The COLLECT option causes these plots to be collected or retained for redisplay later. The option HPLOTS=2 enables the two plots to appear side by side on one page. The symbol ’f’ is used on these plots to identify them as resulting from the full model. These statements produce Figure 73.35:
plot r.*p. student.*p. / collect hplots=2 symbol='f'; run;
-+-----+-----+-----+-----+-----+- -+-----+-----+-----+-----+-----+-- 40 + + | | | | | | | | 3 + + | | | | | | | | | | | f | | | 2 + + | | | | R 20 + f + | | E | | S | f f f | S | f | T 1 + f + I | f f | U | | D | f | D | f | U | | E | f | A | f | N 0 + f f + L | f | T | f | 0 + f f + | f f | | f | | f | | f f | -1 + + | f | | f f | | | | f f | | f f | | | | f | -2 + + | f | | | -20 + + | | -+-----+-----+-----+-----+-----+- -+-----+-----+-----+-----+-----+-- 40 60 80 100 120 140 40 60 80 100 120 140 PRED PRED |
Note that these plots are not overlaid. The COLLECT option does not overlay the plots in one PLOT statement but retains them so that they can be overlaid by later plots. When the COLLECT option appears in a PLOT statement, the plots in that statement become the first plots in the collection.
Next, the model is reduced by deleting the Age variable. The PLOT statement requests the same plots as before but labels the points with the symbol ’r’ denoting the reduced model. The following statements produce Figure 73.36:
delete Age; plot r.*p. student.*p. / symbol='r'; run;
-+-----+-----+-----+-----+-----+- -+-----+-----+-----+-----+-----+-- 40 + + | | | | | | | | 3 + + | | | | | | | | | | | f | | | 2 + + | | | r | R 20 + f + | r | E | r | S | ? f ? | S | rf | T 1 + rf + I | ? r ? | U | r | D | f | D | f | U | r | E | ? | A | ? | N 0 + ? ? + L | ? | T | rf | 0 + ? ? + | ? ? | | rf | | ? | | ? ? | -1 + + | ? | | ? fr | | | | ? f | | ? fr | | r | | f | -2 + + | r ? | | | -20 + + | | -+-----+-----+-----+-----+-----+- -+-----+-----+-----+-----+-----+-- 40 60 80 100 120 140 40 60 80 100 120 140 PRED PRED |
Notice that the COLLECT option causes the corresponding plots to be overlaid. Also notice that the DELETE statement causes the model label to be changed from MODEL1 to MODEL1.1. The points labeled ’f’ are from the full model, and the points labeled ’r’ are from the reduced model. Positions labeled ’?’ contain at least one point from each model. In this example, the OVERLAY option cannot be used because all of the plots to be overlaid cannot be specified in one PLOT statement. With the COLLECT option, any changes to the model or the data used to fit the model do not affect plots collected before the changes. Collected plots are always reproduced exactly as they first appear. (Similarly, a PAINT statement does not affect plots collected before the PAINT statement is issued.)
The previous example overlays the residual plots for two different models. You might prefer to see them side by side on the same page. This can also be done with the COLLECT option by using a blank plot. Continuing from the last example, the COLLECT, HPLOTS=2, and SYMBOL=’r’ options are still in effect. In the following PLOT statement, the CLEAR option deletes the collected plots and enables the specified plot to begin a new collection. The plot created is the residual plot for the reduced model. These statements produce Figure 73.37:
plot r.*p. / clear; run;
-+-----+-----+-----+-----+-----+- 20 + + | r | | | | r | | | | r r r | 10 + + | | R | r | E | | S | r | I | | D 0 + r r + U | r | A | | L | r r | | r | | | -10 + + | | | r r | | | | r | | r | -20 + + -+-----+-----+-----+-----+-----+- 40 60 80 100 120 140 PRED |
The next statements add the variable AGE to the model and place the residual plot for the full model next to the plot for the reduced model. Notice that a blank plot is created in the first plot request by placing nothing between the quotes. Since the COLLECT option is in effect, this plot is superimposed on the residual plot for the reduced model. The residual plot for the full model is created by the second request. The result is the desired side-by-side plots. The NOCOLLECT option turns off the collection process after the specified plots are added and displayed. Any PLOT statements that follow show only the newly specified plots. These statements produce Figure 73.38:
add Age; plot r.*p.='' r.*p.='f' / nocollect; run;
---+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+---- U U95 | | p 200 + + p | | e | | r | - | | | B 150 + - o + o | - - -- - | u | - - -o o | n | -- - | d | - -- - - o - -- - o o - | 100 + - o ? o o - + o | - o - - o -- | f | ?? ?- o ? - -- - | | | 9 | - - - | 5 50 + o -- -- + % | | | - | C | | . | | I 0 + + . | | ( ---+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+---- I 50 52 54 56 58 60 62 64 66 68 70 72 n Height |
-+-----+-----+-----+-----+-----+- -+-----+-----+-----+-----+-----+-- 40 + + | | | | | | | | 3 + + | | | | | | | | | | | f | | | 2 + + | | | | R 20 + f + | | E | | S | f f f | S | f | T 1 + f + I | f f | U | | D | f | D | f | U | | E | f | A | f | N 0 + f f + L | f | T | f | 0 + f f + | f f | | f | | f | | f f | -1 + + | f | | f f | | | | f f | | f f | | | | f | -2 + + | f | | | -20 + + | | -+-----+-----+-----+-----+-----+- -+-----+-----+-----+-----+-----+-- 40 60 80 100 120 140 40 60 80 100 120 140 PRED PRED |
-+-----+-----+-----+-----+-----+- -+-----+-----+-----+-----+-----+-- 40 + + | | | | | | | | 3 + + | | | | | | | | | | | f | | | 2 + + | | | r | R 20 + f + | r | E | r | S | ? f ? | S | rf | T 1 + rf + I | ? r ? | U | r | D | f | D | f | U | r | E | ? | A | ? | N 0 + ? ? + L | ? | T | rf | 0 + ? ? + | ? ? | | rf | | ? | | ? ? | -1 + + | ? | | ? fr | | | | ? f | | ? fr | | r | | f | -2 + + | r ? | | | -20 + + | | -+-----+-----+-----+-----+-----+- -+-----+-----+-----+-----+-----+-- 40 60 80 100 120 140 40 60 80 100 120 140 PRED PRED |
-+-----+-----+-----+-----+-----+- 20 + + | r | | | | r | | | | r r r | 10 + + | | R | r | E | | S | r | I | | D 0 + r r + U | r | A | | L | r r | | r | | | -10 + + | | | r r | | | | r | | r | -20 + + -+-----+-----+-----+-----+-----+- 40 60 80 100 120 140 PRED |
Frequently, when the COLLECT option is in effect, you want the current and following PLOT statements to show only the specified plots. To do this, use both the CLEAR and NOCOLLECT options in the current PLOT statement.
Painting scatter plots is a useful interactive tool that enables you to mark points of interest in scatter plots. Painting can be used to identify extreme points in scatter plots or to reveal the relationship between two scatter plots. The Class data (from the section Simple Linear Regression) is used to illustrate some of these applications.
The following statements produce the scatter plot of the studentized residuals against the predicted values in Figure 73.39.
proc reg data=Class lineprinter; model Weight=Age Height / noprint; plot student.*p.; run;
---+------+------+------+------+------+------+------+------+------+--- STUDENT | | S 3 + + t | | u | | d | 1 | e 2 + + n | | t | | i | 1 1 1 | z 1 + 1 + e | | d | 11 | | 1 | R 0 + 1 1 + e | 1 | s | 1 2 | i | 1 | d -1 + + u | 1 1 | a | 1 1 | l | | -2 + + | | ---+------+------+------+------+------+------+------+------+------+--- 50 60 70 80 90 100 110 120 130 140 Predicted Value of Weight PRED |
Then, the following statements identify the observation ’Henry’ in the scatter plot and produce the plot in Figure 73.40:
paint Name='Henry' / symbol = 'H'; plot; run;
---+------+------+------+------+------+------+------+------+------+--- STUDENT | | S 3 + + t | | u | | d | 1 | e 2 + + n | | t | | i | 1 1 1 | z 1 + 1 + e | | d | 11 | | 1 | R 0 + 1 1 + e | H | s | 1 2 | i | 1 | d -1 + + u | 1 1 | a | 1 1 | l | | -2 + + | | ---+------+------+------+------+------+------+------+------+------+--- 50 60 70 80 90 100 110 120 130 140 Predicted Value of Weight PRED |
Next, the following statements identify observations with large absolute residuals:
paint student.>=2 or student.<=-2 / symbol='s'; plot; run;
The log shows the observation numbers found with these conditions and gives the painting symbol and the number of observations found. Note that the previous PAINT statement is also used in the PLOT statement. Figure 73.41 shows the scatter plot produced by the preceding statements.
---+------+------+------+------+------+------+------+------+------+--- STUDENT | | S 3 + + t | | u | | d | s | e 2 + + n | | t | | i | 1 1 1 | z 1 + 1 + e | | d | 11 | | 1 | R 0 + 1 1 + e | H | s | 1 2 | i | 1 | d -1 + + u | 1 1 | a | 1 1 | l | | -2 + + | | ---+------+------+------+------+------+------+------+------+------+--- 50 60 70 80 90 100 110 120 130 140 Predicted Value of Weight PRED |
The following statements relate two different scatter plots. These statements produce the plot in Figure 73.42.
paint student.>=1 / symbol='p'; paint student.<1 and student.>-1 / symbol='s'; paint student.<=-1 / symbol='n'; plot student. * p. cookd. * h. / hplots=2; run;
-+-----+-----+-----+-----+-----+-- -+----+----+----+----+----+----+- | | 0.8 + p + | | | | 3 + + | | | | | | | | | | | p | | | 2 + + 0.6 + + | | | | | | | | S | p p p | | | T 1 + s + C | | U | | O | | D | s | O 0.4 + + E | s | K | | N 0 + s s + D | | T | s | | | | s s | | | | s | | | -1 + + 0.2 + + | n n | | p | | n n | | n | | | | s | -2 + + | n p p n s | | | | n ss | | | 0.0 + ss ss s + -+-----+-----+-----+-----+-----+-- -+----+----+----+----+----+----+- 40 60 80 100 120 140 0.05 0.10 0.15 0.20 0.25 0.30 0.35 PRED H |
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.