A common use for a lattice
is to create a graph that shows different subsets of the same input
data. In some cases, those subsets are already defined in the input
data. However, you frequently have to transform the input data to
make it suitable for the graph that you are trying to create. This
might require any or all of the following:
-
-
-
-
creating new variables that represent
subsets of the data.
The graph that is
shown in Stock Plot is based on
data from SASHELP.STOCKS, which contains several years of monthly
stock information for three companies. The data set contains columns
for STOCK, DATE , VOLUME, and ADJCLOSE (Adjusted Closing Price). However,
it does not have the volume and price information in the form that
is needed for the graph. The LATTICE layout does not support subsets
of the input data on a per-cell basis. So, in order to make the cell
content different, unique variables must be created for each cell
to provide the appropriate date, volume, and price information. The
following DATA step performs the necessary input data transformations:
data stock;
set sashelp.stocks;
where stock eq "Microsoft" and year(date) in (2004 2005);
format Date2004 Date2005 date.
Price2004 Price2005 dollar6.;
label Date2004="2004" Date2005="2005";
if year(date) = 2004 then do;
Date2004=date;
Vol2004=volume*10**-6;
Price2004=adjclose;
end;
else if year(date)=2005 then do;
Date2005=date;
Vol2005=volume*10**-6;
Price2005=adjclose;
end;
keep Date2004 Date2005 Vol2004
Vol2005 Price2004 Price2005;
run;
The data is filtered
for Microsoft and for the years 2004 and 2005. Next, new variables
are created for each year and the Volume and Stock Price within each
year. Because the volumes are large, they are scaled to millions.
This scaling is noted in the graph. This coding results in a "sparse"
data set, but it is the correct organization for the lattice because
observations with missing X or Y values are not plotted.
Obs Date2004 Date2005 Price2004 Price2005 Vol2004 Vol2005
1 . 01DEC05 . $26 . 62.8924
2 . 01NOV05 . $27 . 71.4692
3 . 03OCT05 . $25 . 72.1325
4 . 01SEP05 . $25 . 66.9765
5 . 01AUG05 . $27 . 65.5300
6 . 01JUL05 . $25 . 69.0466
7 . 01JUN05 . $25 . 62.9567
8 . 02MAY05 . $25 . 62.6998
9 . 01APR05 . $25 . 77.0902
10 . 01MAR05 . $24 . 72.8997
11 . 01FEB05 . $25 . 75.9923
12 . 03JAN05 . $26 . 79.6428
13 01DEC04 . $26 . 84.4881 .
14 01NOV04 . $26 . 86.4461 .
15 01OCT04 . $25 . 65.7429 .
16 01SEP04 . $24 . 57.7253 .
17 02AUG04 . $24 . 52.1046 .
18 01JUL04 . $25 . 76.6667 .
19 01JUN04 . $25 . 77.0683 .
20 03MAY04 . $23 . 58.9425 .
21 01APR04 . $23 . 77.3867 .
22 01MAR04 . $22 . 77.1119 .
23 02FEB04 . $23 . 57.3859 .
24 02JAN04 . $24 . 63.6359 .
The key point to be
aware of is that every plot in every cell must use variables that
contain just the information appropriate for that cell. You cannot
use WHERE clauses within the template definition to form subsets of
the data.
The following initial
template defines the lattice:
proc template;
define statgraph lattice1;
begingraph;
entrytitle "Microsoft Stock Performance";
layout lattice / columns=2 rows=2;
/* define row 1 */
seriesplot y=price2004 x=date2004 / lineattrs=GraphData1;
seriesplot y=price2005 x=date2005 / lineattrs=GraphData1;
/* define row 2 */
needleplot y=vol2004 x=date2004 /
lineattrs=GraphData2(thickness=2px pattern=solid);
needleplot y=vol2005 x=date2005 /
lineattrs= GraphData2(thickness=2px pattern=solid);
endlayout;
endgraph;
end;
run;
proc sgrender data=stock template=lattice1;
run;
Note that because Date2004
and Date2005 have an associated SAS date format that a TIME axis is
used and the variable labels are used for X-axis labels.