25016 - Reshapes the default output data set from PROC UNIVARIATE, MEANS, or SUMMARY

Sample 25016: Reshapes the default output data set from PROC UNIVARIATE, MEANS, or SUMMARY

Reshape the OUT= data set from PROC UNIVARIATE, MEANS, or SUMMARY

Contents:

Purpose / Requirements / Details

PURPOSE:

Illustrates how to create a data set of descriptive statistics on multiple input variables such that each row of the data set corresponds to an input variable with a column for each computed statistic. This effectively reshapes the default output data set from PROC UNIVARIATE, MEANS, or SUMMARY.

REQUIREMENTS:

Version 6 and later of Base SAS.

DETAILS:

When using the UNIVARIATE, MEANS, or SUMMARY procedure to compute multiple statistics on a set of variables, the OUT= data set contains a single observation. Any variable in this observation contains the value of a particular statistic on one of the original variables. Often, one wants this data set rearranged so that it looks more like the displayed output from PROC MEANS, with one observation for each original variable and one variable for each computed statistic.

You can use these procedures to create an output data set of the desired form by taking advantage of the ability to use multiple OUTPUT statements to create a separate data set for each computed statistic. The resulting data sets are concatenated in a subsequent DATA step and then transposed into the final form with PROC TRANSPOSE.

Note that unless you want percentiles other than the default set, the MEANS or SUMMARY procedure is more efficient at computing statistics than UNIVARIATE. If you compute percentiles, note that you need to create an output data set for each percentile and convert the variable names back to the original variable names as illustrated in this example.

If you use PROC MEANS or PROC SUMMARY, the syntax can be simplified a little further by dropping the list of variable names after each statistic since these procedures use the original data set names by default. For example:

      proc summary data=iris;
         var sepallen sepalwid petallen petalwid;
         output out=mean mean=;
         output out=std  std=;
         output out=n    n=;
         run;

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

See the Results tab for the results of running the following SAS code.


data iris;
   input sepallen sepalwid petallen petalwid specno @@;
   if specno=1 then species='SETOSA    ';
   else if specno=2 then species='VERSICOLOR';
   else species='VIRGINICA ';
   drop specno;
   datalines;
50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3
63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2
59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2
65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3
68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3
77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3
49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2
64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3
55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1
49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1
67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1
77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2
50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1
61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1
61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1
51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1
51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1
46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1
50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3
57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1
71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3
49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1
49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1
66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1
44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2
47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2
74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1
56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3
49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1
56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2
51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3
54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3
61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3
68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1
45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1
55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1
51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2
63 33 60 25 3 53 37 15 02 1
;

/* Usual output of statistics for all input variables */
proc univariate data=iris noprint;
   var sepallen sepalwid petallen petalwid;
   output out=out mean=m1-m4 std=s1-s4 n=n1-n4 p10=p10_1-p10_4
                  pctlpts=20 40 pctlpre=sl sw pl pw;
   run;

proc print data=out;
   title "Usual UNIVARIATE output data set";
   run;


/* Create data set with variables as rows, statistics as columns */
proc univariate data=iris noprint;
   var sepallen sepalwid petallen petalwid;
   /* Create one data set per statistic and use original
      data set names. */
   output mean=sepallen sepalwid petallen petalwid
          out=mean;
   output std=sepallen sepalwid petallen petalwid
          out=std;
   output n=sepallen sepalwid petallen petalwid
          out=n;
   output p10=sepallen sepalwid petallen petalwid
          out=p10;
   /* For percentiles in UNIVARIATE, create one data set per
      percentile and set names to original variable names */
   output pctlpts=20 pctlpre=a b c d
          out=p20 (rename=(a20=sepallen b20=sepalwid
                           c20=petallen d20=petalwid));
   output pctlpts=40 pctlpre=a b c d
          out=p40 (rename=(a40=sepallen b40=sepalwid
                           c40=petallen d40=petalwid));
   run;

data stats;
   set mean std n p10 p20 p40;
   run;

proc transpose data=stats name=varname
   /* Apply new variable names in same order as the 
    * concatenation of data sets in the DATA step above. */
   out=reshaped (rename=(col1=mean col2=std col3=n col4=p10
                         col5=p20 col6=p40));
   var sepallen sepalwid petallen petalwid;
   run;

proc print data=reshaped;
   var varname mean std n p10 p20 p40;
   title "Reshaped UNIVARIATE output data set";
   run;


/************** With BY processing ***************/


/* Usual output of statistics for all input variables */
proc sort data=iris out=iris2;
   by species;
   run;

proc univariate data=iris2 noprint;
   var sepallen sepalwid petallen petalwid;
   by species;
   output out=out mean=m1-m4 std=s1-s4 n=n1-n4 p10=p10_1-p10_4
                  pctlpts=20 40 pctlpre=sl sw pl pw;
   run;

proc print data=out;
   title "Usual UNIVARIATE output data set";
   run;


/* Create data set with variables as rows, statistics as columns */
proc univariate data=iris2 noprint;
   by species;
   var sepallen sepalwid petallen petalwid;
   /* Create one data set per statistic and use original
      data set names. */
   output mean=sepallen sepalwid petallen petalwid
          out=mean;
   output std=sepallen sepalwid petallen petalwid
          out=std;
   output n=sepallen sepalwid petallen petalwid
          out=n;
   output p10=sepallen sepalwid petallen petalwid
          out=p10;
   /* For percentiles in UNIVARIATE, create one data set per
      percentile and set names to original variable names */
   output pctlpts=20 pctlpre=a b c d
          out=p20 (rename=(a20=sepallen b20=sepalwid
                           c20=petallen d20=petalwid));
   output pctlpts=40 pctlpre=a b c d
          out=p40 (rename=(a40=sepallen b40=sepalwid
                           c40=petallen d40=petalwid));
   run;

data stats;
   set mean std n p10 p20 p40;
   by species;
   run;

proc transpose data=stats name=varname
   /* Apply new variable names in same order as the 
    * concatenation of data sets in the DATA step above. */
   out=reshaped(rename=(col1=mean col2=std col3=n col4=p10
                        col5=p20 col6=p40));
   var sepallen sepalwid petallen petalwid;
   by species;
   run;

proc print data=reshaped;
   var varname mean std n p10 p20 p40;
   by species;
   title "Reshaped UNIVARIATE output data set";
   run;

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

Results without BY processing:

Usual UNIVARIATE output data set

Obs	n1	n2	n3	n4	m1	m2	m3	m4	s1	s2	s3	s4	p10_1	p10_2	p10_3	p10_4	sl20	sl40	sw20	sw40	pl20	pl40	pw20	pw40
1	150	150	150	150	58.4333	30.5733	37.58	11.9933	8.28066	4.35866	17.6530	7.62238	48	25	14	2	50	56	27	30	15	39	2	11.5

Reshaped UNIVARIATE output data set

Obs	varname	mean	std	n	p10	p20	p40
1	sepallen	58.4333	8.2807	150	48	50	56.0
2	sepalwid	30.5733	4.3587	150	25	27	30.0
3	petallen	37.5800	17.6530	150	14	15	39.0
4	petalwid	11.9933	7.6224	150	2	2	11.5

Results when using BY processing:

Usual UNIVARIATE output data set

Obs	species	n1	n2	n3	n4	m1	m2	m3	m4	s1	s2	s3	s4	p10_1	p10_2	p10_3	p10_4	sl20	sl40	sw20	sw40	pl20	pl40	pw20	pw40
1	SETOSA	50	50	50	50	50.06	34.28	14.62	2.46	3.52490	3.79064	1.73664	1.05386	45.5	30.0	13.0	1.5	47	49.5	31	34	13	14.0	2.0	2
2	VERSICOLOR	50	50	50	50	59.36	27.70	42.60	13.26	5.16171	3.13798	4.69911	1.97753	53.0	23.0	35.5	10.0	55	57.0	25	27	39	42.0	11.5	13
3	VIRGINICA	50	50	50	50	65.88	29.74	55.52	20.26	6.35880	3.22497	5.51895	2.74650	58.0	25.5	49.0	17.5	61	64.0	27	29	51	53.5	18.0	19

Reshaped UNIVARIATE output data set

species=SETOSA

Obs	varname	mean	std	n	p10	p20	p40
1	sepallen	50.06	3.52490	50	45.5	47	49.5
2	sepalwid	34.28	3.79064	50	30.0	31	34.0
3	petallen	14.62	1.73664	50	13.0	13	14.0
4	petalwid	2.46	1.05386	50	1.5	2	2.0

species=VERSICOLOR

Obs	varname	mean	std	n	p10	p20	p40
5	sepallen	59.36	5.16171	50	53.0	55.0	57
6	sepalwid	27.70	3.13798	50	23.0	25.0	27
7	petallen	42.60	4.69911	50	35.5	39.0	42
8	petalwid	13.26	1.97753	50	10.0	11.5	13

species=VIRGINICA

Obs	varname	mean	std	n	p10	p20	p40
9	sepallen	65.88	6.35880	50	58.0	61	64.0
10	sepalwid	29.74	3.22497	50	25.5	27	29.0
11	petallen	55.52	5.51895	50	49.0	51	53.5
12	petalwid	20.26	2.74650	50	17.5	18	19.0

Type:	Sample
Topic:	SAS Reference ==> Procedures ==> MEANS SAS Reference ==> Procedures ==> UNIVARIATE Analytics ==> Descriptive Statistics SAS Reference ==> Procedures ==> SUMMARY

Date Modified:	2019-05-01 14:26:26
Date Created:	2005-01-13 15:03:43

Product Family	Product	Host	SAS Release
Product Family	Product	Host	Starting	Ending
SAS System	Base SAS	All	n/a	n/a
SAS System	SAS/STAT	All	n/a	n/a

Support

Sample 25016: Reshapes the default output data set from PROC UNIVARIATE, MEANS, or SUMMARY

Reshape the OUT= data set from PROC UNIVARIATE, MEANS, or SUMMARY

Results without BY processing:

Results when using BY processing:

Operating System and Release Information