Sashelp Data Sets


Leukemia Data Sets

The Sashelp.LeuTrain and Sashelp.LeuTest data sets provide microarray data from (Golub et al., 1999; Zou and Hastie, 2005). The Sashelp.LeuTrain data set consists of 7,129 genes and 38 training samples, and the Sashelp.LeuTest data set consists of the same 7,129 genes and 34 testing samples. Among the 38 training samples, 27 are type 1 leukemia (acute lymphoblastic leukemia, coded in the data as 1) and 11 are type 2 leukemia (acute myeloid leukemia, coded in the data as –1).

The following steps display information about Sashelp.LeuTrain data set and create Figure B.12:

title 'Leukemia Training Data';
proc contents data=sashelp.LeuTrain varnum;
   ods select position;
run;

title 'The First Five Observations and 11 Variables';
proc print data=sashelp.LeuTrain(obs=5);
   var y x1-x10;
run;

title 'Leukemia Type Variable';
proc freq data=sashelp.LeuTrain;
   tables y;
run;

Figure B.12: Leukemia Training Data

The First Five Observations and 11 Variables

Obs y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 1 -1.46240 -0.64514 -0.83593 -1.47040 -0.91997 -1.58430 0.71239 -0.54229 1.05090 0.23649
2 1 -0.66480 0.20615 -0.36857 0.25822 -0.47567 -0.35497 -1.11940 -0.29251 -0.37542 -0.38760
3 1 -0.20049 0.37994 -2.38280 0.43960 -1.22700 -1.76220 0.10464 -1.80750 0.49292 -1.67000
4 1 -0.25776 0.27994 1.83920 -1.62950 -1.28750 -1.26510 0.76334 -0.61645 -0.31578 -0.32193
5 1 -0.56457 -0.39588 -0.98372 -0.83741 -0.41477 0.14834 -0.03550 -0.10022 -0.75753 0.37068

Leukemia Type Variable

y Frequency Percent Cumulative
Frequency
Cumulative
Percent
-1 11 28.95 11 28.95
1 27 71.05 38 100.00



The results of the PROC CONTENTS step are not displayed. The results show that there are 7,130 variables, y and x1-x7129.

The following steps display information about Sashelp.LeuTest data set and create Figure B.13:

title 'Leukemia Test Data';
proc contents data=sashelp.LeuTest varnum;
   ods select position;
run;

title 'The First Five Observations and 11 Variables';
proc print data=sashelp.LeuTest(obs=5);
   var y x1-x10;
run;

title 'Leukemia Type Variable';
proc freq data=sashelp.LeuTest;
   tables y;
run;

Figure B.13: Leukemia Test Data

The First Five Observations and 11 Variables

Obs y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 1 -1.38240 0.06288 0.62252 1.61210 0.52179 0.11516 -1.85270 -0.39956 0.88007 -0.86565
2 1 0.65192 -0.35476 2.29630 1.64980 0.50211 -0.37315 1.76820 -1.74270 1.63080 0.60171
3 1 0.65409 1.41340 0.22593 -0.06719 0.30015 0.76964 -0.26212 0.94481 -0.51884 -0.60999
4 1 1.07220 0.01959 0.16875 0.84779 0.24533 0.79682 0.41442 0.35122 -0.70177 1.85410
5 1 2.12480 1.66370 -0.35986 1.15850 0.89379 0.56310 -0.92476 0.56790 -0.56039 -2.12400

Leukemia Type Variable

y Frequency Percent Cumulative
Frequency
Cumulative
Percent
-1 14 41.18 14 41.18
1 20 58.82 34 100.00



The results of the PROC CONTENTS step are not displayed. The results show that there are 7,130 variables, y and x1-x7129.