A common
use for a lattice is to create a graph that shows different subsets
of the same input data. In some cases, those subsets are already defined
in the input data. However, you will frequently have to transform
the input data to make it suitable for the graph you are trying to
create. This might require any or all of the following:
-
-
-
-
creating new variables that represent
subsets of the data.
The graph
that is
shown in Stock Plot is based on data from SASHELP.STOCKS, which contains several
years of monthly stock information for three companies. The data set
contains columns for STOCK, DATE , VOLUME, and ADJCLOSE (Adjusted
Closing Price). However, it does not have the volume and price information
in the form that is needed for the graph. The LATTICE layout does
not support subsets of the input data on a per-cell basis. So, in
order to make the cell content different, unique variables must be
created for each cell to provide the appropriate date, volume, and
price information. The following DATA step performs the necessary
input data transformations:
data stock;
set sashelp.stocks;
where stock eq "Microsoft" and year(date) in (2004 2005);
format Date2004 Date2005 date.
Price2004 Price2005 dollar6.;
label Date2004="2004" Date2005="2005";
if year(date) = 2004 then do;
Date2004=date;
Vol2004=volume*10**-6;
Price2004=adjclose;
end;
else if year(date)=2005 then do;
Date2005=date;
Vol2005=volume*10**-6;
Price2005=adjclose;
end;
keep Date2004 Date2005 Vol2004
Vol2005 Price2004 Price2005;
run;
The data
is filtered for Microsoft and for the years 2004 and 2005. Next,
new variables are created for each year and the Volume and Stock Price
within each year. Because the volumes are large, they are scaled to
millions. This scaling will be noted in the graph. This coding results
in a "sparse" data set, but it is the correct organization for the
lattice because observations with missing X or Y values are not plotted.
Obs Date2004 Date2005 Price2004 Price2005 Vol2004 Vol2005
1 . 01DEC05 . $26 . 62.8924
2 . 01NOV05 . $27 . 71.4692
3 . 03OCT05 . $25 . 72.1325
4 . 01SEP05 . $25 . 66.9765
5 . 01AUG05 . $27 . 65.5300
6 . 01JUL05 . $25 . 69.0466
7 . 01JUN05 . $25 . 62.9567
8 . 02MAY05 . $25 . 62.6998
9 . 01APR05 . $25 . 77.0902
10 . 01MAR05 . $24 . 72.8997
11 . 01FEB05 . $25 . 75.9923
12 . 03JAN05 . $26 . 79.6428
13 01DEC04 . $26 . 84.4881 .
14 01NOV04 . $26 . 86.4461 .
15 01OCT04 . $25 . 65.7429 .
16 01SEP04 . $24 . 57.7253 .
17 02AUG04 . $24 . 52.1046 .
18 01JUL04 . $25 . 76.6667 .
19 01JUN04 . $25 . 77.0683 .
20 03MAY04 . $23 . 58.9425 .
21 01APR04 . $23 . 77.3867 .
22 01MAR04 . $22 . 77.1119 .
23 02FEB04 . $23 . 57.3859 .
24 02JAN04 . $24 . 63.6359 .
The key
point to be aware of is that every plot in every cell must use variables
that contain just the information appropriate for that cell. You cannot
use WHERE clauses within the template definition to form subsets of
the data.
The following
initial template defines the lattice:
proc template;
define statgraph lattice1;
begingraph;
entrytitle "Microsoft Stock Performance";
layout lattice / columns=2 rows=2;
/* define row 1 */
seriesplot y=price2004 x=date2004 / lineattrs=GraphData1;
seriesplot y=price2005 x=date2005 / lineattrs=GraphData1;
/* define row 2 */
needleplot y=vol2004 x=date2004 /
lineattrs=GraphData2(thickness=2px pattern=solid);
needleplot y=vol2005 x=date2005 /
lineattrs= GraphData2(thickness=2px pattern=solid);
endlayout;
endgraph;
end;
run;
proc sgrender data=stock template=lattice1;
run;
Note that
because Date2004 and Date2005 have an associated SAS date format that
a TIME axis is used and the variable labels are used for X-axis labels.