Illustrates how to create a data set of descriptive statistics on
multiple input variables such that each row of the data set
corresponds to an input variable with a column for each
computed statistic. This effectively reshapes the default output
data set from PROC UNIVARIATE, MEANS, or SUMMARY.
REQUIREMENTS:
Version 6 and later of Base SAS.
DETAILS:
When using the UNIVARIATE, MEANS, or SUMMARY procedure to compute
multiple statistics on a set of variables, the OUT= data set
contains a single observation. Any variable in this observation
contains the value of a particular statistic on one of the original
variables. Often, one wants this data set rearranged so that it
looks more like the displayed output from PROC MEANS, with one
observation for each original variable and one variable for each
computed statistic.
You can use these procedures to create an output data set of the
desired form by taking advantage of the ability to use multiple
OUTPUT statements to create a separate data set for each computed
statistic. The resulting data sets are concatenated in a
subsequent DATA step and then transposed into the final form with
PROC TRANSPOSE.
Note that unless you want percentiles other than the default set,
the MEANS or SUMMARY procedure is more efficient at computing
statistics than UNIVARIATE. If you compute percentiles, note that you need to create
an output data set for each percentile and convert the variable names
back to the original variable names as illustrated in this example.
If you use PROC MEANS or PROC SUMMARY, the syntax can be simplified a
little further by dropping the list of variable names after each
statistic since these procedures use the original data set names by
default. For example:
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
See the Results tab for the results of running the following SAS code.
data iris;
input sepallen sepalwid petallen petalwid specno @@;
if specno=1 then species='SETOSA ';
else if specno=2 then species='VERSICOLOR';
else species='VIRGINICA ';
drop specno;
datalines;
50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3
63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2
59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2
65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3
68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3
77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3
49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2
64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3
55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1
49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1
67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1
77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2
50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1
61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1
61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1
51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1
51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1
46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1
50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3
57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1
71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3
49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1
49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1
66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1
44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2
47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2
74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1
56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3
49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1
56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2
51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3
54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3
61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3
68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1
45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1
55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1
51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2
63 33 60 25 3 53 37 15 02 1
;
/* Usual output of statistics for all input variables */
proc univariate data=iris noprint;
var sepallen sepalwid petallen petalwid;
output out=out mean=m1-m4 std=s1-s4 n=n1-n4 p10=p10_1-p10_4
pctlpts=20 40 pctlpre=sl sw pl pw;
run;
proc print data=out;
title "Usual UNIVARIATE output data set";
run;
/* Create data set with variables as rows, statistics as columns */
proc univariate data=iris noprint;
var sepallen sepalwid petallen petalwid;
/* Create one data set per statistic and use original
data set names. */
output mean=sepallen sepalwid petallen petalwid
out=mean;
output std=sepallen sepalwid petallen petalwid
out=std;
output n=sepallen sepalwid petallen petalwid
out=n;
output p10=sepallen sepalwid petallen petalwid
out=p10;
/* For percentiles in UNIVARIATE, create one data set per
percentile and set names to original variable names */
output pctlpts=20 pctlpre=a b c d
out=p20 (rename=(a20=sepallen b20=sepalwid
c20=petallen d20=petalwid));
output pctlpts=40 pctlpre=a b c d
out=p40 (rename=(a40=sepallen b40=sepalwid
c40=petallen d40=petalwid));
run;
data stats;
set mean std n p10 p20 p40;
run;
proc transpose data=stats name=varname
/* Apply new variable names in same order as the
* concatenation of data sets in the DATA step above. */
out=reshaped (rename=(col1=mean col2=std col3=n col4=p10
col5=p20 col6=p40));
var sepallen sepalwid petallen petalwid;
run;
proc print data=reshaped;
var varname mean std n p10 p20 p40;
title "Reshaped UNIVARIATE output data set";
run;
/************** With BY processing ***************/
/* Usual output of statistics for all input variables */
proc sort data=iris out=iris2;
by species;
run;
proc univariate data=iris2 noprint;
var sepallen sepalwid petallen petalwid;
by species;
output out=out mean=m1-m4 std=s1-s4 n=n1-n4 p10=p10_1-p10_4
pctlpts=20 40 pctlpre=sl sw pl pw;
run;
proc print data=out;
title "Usual UNIVARIATE output data set";
run;
/* Create data set with variables as rows, statistics as columns */
proc univariate data=iris2 noprint;
by species;
var sepallen sepalwid petallen petalwid;
/* Create one data set per statistic and use original
data set names. */
output mean=sepallen sepalwid petallen petalwid
out=mean;
output std=sepallen sepalwid petallen petalwid
out=std;
output n=sepallen sepalwid petallen petalwid
out=n;
output p10=sepallen sepalwid petallen petalwid
out=p10;
/* For percentiles in UNIVARIATE, create one data set per
percentile and set names to original variable names */
output pctlpts=20 pctlpre=a b c d
out=p20 (rename=(a20=sepallen b20=sepalwid
c20=petallen d20=petalwid));
output pctlpts=40 pctlpre=a b c d
out=p40 (rename=(a40=sepallen b40=sepalwid
c40=petallen d40=petalwid));
run;
data stats;
set mean std n p10 p20 p40;
by species;
run;
proc transpose data=stats name=varname
/* Apply new variable names in same order as the
* concatenation of data sets in the DATA step above. */
out=reshaped(rename=(col1=mean col2=std col3=n col4=p10
col5=p20 col6=p40));
var sepallen sepalwid petallen petalwid;
by species;
run;
proc print data=reshaped;
var varname mean std n p10 p20 p40;
by species;
title "Reshaped UNIVARIATE output data set";
run;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
Results without BY processing:
Usual UNIVARIATE output data set
Obs
n1
n2
n3
n4
m1
m2
m3
m4
s1
s2
s3
s4
p10_1
p10_2
p10_3
p10_4
sl20
sl40
sw20
sw40
pl20
pl40
pw20
pw40
1
150
150
150
150
58.4333
30.5733
37.58
11.9933
8.28066
4.35866
17.6530
7.62238
48
25
14
2
50
56
27
30
15
39
2
11.5
Reshaped UNIVARIATE output data set
Obs
varname
mean
std
n
p10
p20
p40
1
sepallen
58.4333
8.2807
150
48
50
56.0
2
sepalwid
30.5733
4.3587
150
25
27
30.0
3
petallen
37.5800
17.6530
150
14
15
39.0
4
petalwid
11.9933
7.6224
150
2
2
11.5
Results when using BY processing:
Usual UNIVARIATE output data set
Obs
species
n1
n2
n3
n4
m1
m2
m3
m4
s1
s2
s3
s4
p10_1
p10_2
p10_3
p10_4
sl20
sl40
sw20
sw40
pl20
pl40
pw20
pw40
1
SETOSA
50
50
50
50
50.06
34.28
14.62
2.46
3.52490
3.79064
1.73664
1.05386
45.5
30.0
13.0
1.5
47
49.5
31
34
13
14.0
2.0
2
2
VERSICOLOR
50
50
50
50
59.36
27.70
42.60
13.26
5.16171
3.13798
4.69911
1.97753
53.0
23.0
35.5
10.0
55
57.0
25
27
39
42.0
11.5
13
3
VIRGINICA
50
50
50
50
65.88
29.74
55.52
20.26
6.35880
3.22497
5.51895
2.74650
58.0
25.5
49.0
17.5
61
64.0
27
29
51
53.5
18.0
19
Reshaped UNIVARIATE output data set
species=SETOSA
Obs
varname
mean
std
n
p10
p20
p40
1
sepallen
50.06
3.52490
50
45.5
47
49.5
2
sepalwid
34.28
3.79064
50
30.0
31
34.0
3
petallen
14.62
1.73664
50
13.0
13
14.0
4
petalwid
2.46
1.05386
50
1.5
2
2.0
species=VERSICOLOR
Obs
varname
mean
std
n
p10
p20
p40
5
sepallen
59.36
5.16171
50
53.0
55.0
57
6
sepalwid
27.70
3.13798
50
23.0
25.0
27
7
petallen
42.60
4.69911
50
35.5
39.0
42
8
petalwid
13.26
1.97753
50
10.0
11.5
13
species=VIRGINICA
Obs
varname
mean
std
n
p10
p20
p40
9
sepallen
65.88
6.35880
50
58.0
61
64.0
10
sepalwid
29.74
3.22497
50
25.5
27
29.0
11
petallen
55.52
5.51895
50
49.0
51
53.5
12
petalwid
20.26
2.74650
50
17.5
18
19.0
Illustrate how to create a data set of descriptive statistics on multiple input variables such that each row of the data set corresponds to an input variable and there is a column for each computed statistic.
Type:
Sample
Topic:
SAS Reference ==> Procedures ==> MEANS SAS Reference ==> Procedures ==> UNIVARIATE Analytics ==> Descriptive Statistics SAS Reference ==> Procedures ==> SUMMARY