EXAMPLE:
In this example, a one-way analysis is done for each of two BY groups (A and B). To illustrate the equivalence of this method of analyzing summary data to analyzing the original data, the example begins with the analysis of an unsummarized data set. The data are then summarized and an analysis of the summarized data is done using the %SUM_GLM macro
data fulldata;
input bygroup $ treat response @@;
cards;
A 1 7.6 A 1 8.3 A 1 7.6
A 2 8.5 A 2 8.7 A 2 7.7 A 2 8.3 A 2 8.7
A 3 6.8 A 3 6.7 A 3 6.6 A 3 6.4
A 4 7.4 A 4 6.5 A 4 6.8
B 1 15.5 B 1 13.8 B 1 14.2 B 1 17.3
B 2 10.6 B 2 12.6 B 2 15.7 B 2 12.6 B 2 13.5 B 2 11.8
B 3 20.5 B 3 17.7 B 3 19.1 B 3 21.1 B 3 16.9 B 3 18.7
B 4 16.4 B 4 13.8 B 4 17.4 B 4 18.8 B 4 19.1
B 5 16.1 B 5 14.4 B 5 13.0
;
The following statements perform the one-way analysis of the original unsummarized data. For brevity, the ODS SELECT statement restricts the tables that are displayed.
ods select overallanova fitstatistics lsmeans diff;
proc glm data=fulldata;
title "One-way analysis of unsummarized data";
by bygroup;
class treat;
model response = treat;
lsmeans treat / stderr tdiff e;
run;
Following are the analysis results for each BY group.
The GLM Procedure
Dependent Variable: response
bygroup=A
3 |
8.34716667 |
2.78238889 |
20.11 |
<.0001 |
11 |
1.52216667 |
0.13837879 |
|
|
14 |
9.86933333 |
|
|
|
0.845768 |
4.955502 |
0.371993 |
7.506667 |
The GLM Procedure
Least Squares Means
bygroup=A
7.83333333 |
0.21477026 |
<.0001 |
1 |
8.38000000 |
0.16636032 |
<.0001 |
2 |
6.62500000 |
0.18599650 |
<.0001 |
3 |
6.90000000 |
0.21477026 |
<.0001 |
4 |
|
-2.01228 0.0693 |
4.252983 0.0014 |
3.072894 0.0106 |
2.01228 0.0693 |
|
7.032927 <.0001 |
5.447881 0.0002 |
-4.25298 0.0014 |
-7.03293 <.0001 |
|
-0.96792 0.3539 |
-3.07289 0.0106 |
-5.44788 0.0002 |
0.96792 0.3539 |
|
The GLM Procedure
Dependent Variable: response
bygroup=B
4 |
130.3183333 |
32.5795833 |
10.61 |
0.0001 |
19 |
58.3200000 |
3.0694737 |
|
|
23 |
188.6383333 |
|
|
|
0.690837 |
11.04776 |
1.751991 |
15.85833 |
The GLM Procedure
Least Squares Means
bygroup=B
15.2000000 |
0.8759957 |
<.0001 |
1 |
12.8000000 |
0.7152475 |
<.0001 |
2 |
19.0000000 |
0.7152475 |
<.0001 |
3 |
17.1000000 |
0.7835144 |
<.0001 |
4 |
14.5000000 |
1.0115127 |
<.0001 |
5 |
|
2.122193 0.0472 |
-3.36014 0.0033 |
-1.61665 0.1224 |
0.523128 0.6069 |
-2.12219 0.0472 |
|
-6.12943 <.0001 |
-4.05323 0.0007 |
-1.37225 0.1860 |
3.360139 0.0033 |
6.129434 <.0001 |
|
1.79096 0.0892 |
3.632416 0.0018 |
1.616648 0.1224 |
4.053226 0.0007 |
-1.79096 0.0892 |
|
2.032086 0.0564 |
-0.52313 0.6069 |
1.372246 0.1860 |
-3.63242 0.0018 |
-2.03209 0.0564 |
|
|
These statements display the summary statistics for each BY group of the original data.
proc sort data=fulldata;
by bygroup;
run;
proc means data=fulldata mean std;
by bygroup;
class treat;
var response;
title "Summary statistics from original data";
run;
The MEANS Procedure
bygroup=A
1 |
3 |
7.8333333 |
0.4041452 |
2 |
5 |
8.3800000 |
0.4147288 |
3 |
4 |
6.6250000 |
0.1707825 |
4 |
3 |
6.9000000 |
0.4582576 |
bygroup=B
1 |
4 |
15.2000000 |
1.5769168 |
2 |
6 |
12.8000000 |
1.7216271 |
3 |
6 |
19.0000000 |
1.6037456 |
4 |
5 |
17.1000000 |
2.1424285 |
5 |
3 |
14.5000000 |
1.5524175 |
|
The following illustrates creating the input data set of summary statistics using a DATA step. This is the method you would use if you were presented with a listing of the summary statistics such as the above.
data summary;
input count means std bygroup $ treat;
cards;
3 7.8333 0.4041 A 1
5 8.3800 0.4147 A 2
4 6.6250 0.1708 A 3
3 6.9000 0.4583 A 4
4 15.200 1.5769 B 1
6 12.800 1.7216 B 2
6 19.000 1.6038 B 3
5 17.100 2.1424 B 4
3 14.500 1.5524 B 5
;
Since we have the unsummarized data in this example, note that the summary data set could be created using PROC SUMMARY as follows:
proc summary data=fulldata nway;
class bygroup treat;
var response;
output out=summary2 mean=means std=std n=count;
run;
proc print;
run;
A |
1 |
3 |
3 |
7.8333 |
0.40415 |
3 |
A |
2 |
3 |
5 |
8.3800 |
0.41473 |
5 |
A |
3 |
3 |
4 |
6.6250 |
0.17078 |
4 |
A |
4 |
3 |
3 |
6.9000 |
0.45826 |
3 |
B |
1 |
3 |
4 |
15.2000 |
1.57692 |
4 |
B |
2 |
3 |
6 |
12.8000 |
1.72163 |
6 |
B |
3 |
3 |
6 |
19.0000 |
1.60375 |
6 |
B |
4 |
3 |
5 |
17.1000 |
2.14243 |
5 |
B |
5 |
3 |
3 |
14.5000 |
1.55242 |
3 |
|
While not necessary in this case since the data are in sorted order, the
input data set must generally be sorted by the BY variables before analysis
by the macro.
proc sort data=summary;
by bygroup;
run;
The following statements define and run the SUM_GLM macro on the summary statistics to reproduce
the analyses of the original data. Slight numerical differences from the analysis of unsummarized data are due to using limited precision when inputting the summary statistics above.
/* Define the SUM_GLM macro */
%inc "<location of your file containing the SUM_GLM macro>";
ods select overallanova fitstatistics lsmeans diff;
%sum_glm(Data=summary,
N=count,
Mean=means,
StdDev=std,
LSopts=stderr tdiff e,
By=bygroup,
Group=Treat)
The GLM Procedure
Dependent Variable: y
bygroup=A
3 |
8.34710134 |
2.78236711 |
20.11 |
<.0001 |
11 |
1.52209368 |
0.13837215 |
|
|
14 |
9.86919502 |
|
|
|
0.845773 |
4.955387 |
0.371984 |
7.506660 |
The GLM Procedure
Least Squares Means
bygroup=A
7.83330000 |
0.21476511 |
<.0001 |
1 |
8.38000000 |
0.16635634 |
<.0001 |
2 |
6.62500000 |
0.18599204 |
<.0001 |
3 |
6.90000000 |
0.21476511 |
<.0001 |
4 |
|
-2.01245 0.0693 |
4.252967 0.0014 |
3.072858 0.0106 |
2.012451 0.0693 |
|
7.033096 <.0001 |
5.448011 0.0002 |
-4.25297 0.0014 |
-7.0331 <.0001 |
|
-0.96794 0.3539 |
-3.07286 0.0106 |
-5.44801 0.0002 |
0.967943 0.3539 |
|
The GLM Procedure
Dependent Variable: y
bygroup=B
4 |
130.3183333 |
32.5795833 |
10.61 |
0.0001 |
19 |
58.3196484 |
3.0694552 |
|
|
23 |
188.6379817 |
|
|
|
0.690838 |
11.04773 |
1.751986 |
15.85833 |
The GLM Procedure
Least Squares Means
bygroup=B
15.2000000 |
0.8759930 |
<.0001 |
1 |
12.8000000 |
0.7152453 |
<.0001 |
2 |
19.0000000 |
0.7152453 |
<.0001 |
3 |
17.1000000 |
0.7835120 |
<.0001 |
4 |
14.5000000 |
1.0115096 |
<.0001 |
5 |
|
2.1222 0.0472 |
-3.36015 0.0033 |
-1.61665 0.1224 |
0.523129 0.6069 |
-2.1222 0.0472 |
|
-6.12945 <.0001 |
-4.05324 0.0007 |
-1.37225 0.1860 |
3.360149 0.0033 |
6.129452 <.0001 |
|
1.790966 0.0892 |
3.632427 0.0018 |
1.616653 0.1224 |
4.053238 0.0007 |
-1.79097 0.0892 |
|
2.032092 0.0564 |
-0.52313 0.6069 |
1.37225 0.1860 |
-3.63243 0.0018 |
-2.03209 0.0564 |
|
|