• Print  |
  • Feedback  |

Knowledge Base


TS-378

How Can I dynamically order the variables in the MEANS procedure output by the descending mean values?

By default, the variables in the following MEANS procedure output are listed in the order they appear in the input data set:

Variable          Mean       Std Dev

SQFEET            1447           352
BEDROOMS             3             1
BATHS                2             1
PRICE            82720         23787

How can I dynamically order the variables by the descending mean values to obtain output as follows?
Variable          Mean       Std Dev

PRICE            82720         23787
SQFEET            1447           352
BEDROOMS             3             1
BATHS                2             1

Two possible methods are the macro facility and the SQL procedure. You can use the TRANSPOSE procedure with each of these techniques to accomplish this. PROC MEANS lists all numeric variables in the order they are physically stored in the input data set. To control the order, a VAR statement must be used. It could be time consuming to run a PROC MEANS and look at the mean values of all the variables to write the VAR statement needed. However, the macro facility enables you to dynamically generate a PROC MEANS VAR statement. An alternative solution is to use PROC TRANSPOSE to create an appropriate input data set for PROC SQL, which can then create the desired output. The following numeric variables and their values from the SASUSER.HOUSES data set are used in the example:

SASUSER.HOUSES

SQFEET  BEDROOMS  BATHS  PRICE

1250       2      1.0   64000
1190       1      1.0   65850
1400       2      1.5   80050
1810       4      3.0  107250
1500       3      3.0   86650
1615       4      3.0   94450
1305       3      1.5   73650
1390       3      2.5   79350
1040       2      1.0   55850
2105       4      2.5  127150
1535       3      3.0   89100
1240       2      1.0   69250
 720       1      1.0   34550
1745       4      2.5  102950
1860       2      2.0  110700

Macro Processing Solution
The macro solution involves creating an output data set from PROC MEANS that contains the ordering statistic. The data set is then transposed, a process that turns the dataset on its side making the variable names into values of a variable called _NAME_ and each statistic into values of a separate variable called COL1. Once the variable names are values of another variable, they can be sorted by the descending values of another variable and then stored in macro variables via CALL SYMPUT. The macro facility then uses the macro variables to generate the desired VAR statement. The following example demonstrates this process:

/* create an output data set and name the statistic */ /* result by which to order */ proc means data=sasuser.houses noprint;

output out=temp mean=; /* transpose the data set so that each variable */ /* name becomes the value of _NAME_ and each mean */ /* the value of COL1 */

proc transpose data=temp(drop=_freq_ _type_) out=temp; /* sort the data set so the means are in */ /* descending order */

proc sort data=temp out=temp;

by descending col1;

/* create a unique macro variable for each observation */ /* to record the order in which the variables should */ /* appear */ data _null_;

set temp end=last;
call symput('v'||left(put(_n_,3.)),_name_); if last then call symput('max',put(_n_,3.)); run;

/* use macro to rewrite the PROC MEANS step using a */ /* VAR statement with the customized order of the */ /* variables and display the final ordering sequence */ %macro gencode;

options nonumber nodate nolabel;
proc means data=sasuser.houses mean std maxdec=0; title "FINAL ORDERING SEQUENCE"; var %do i=1 %to &max; &&v&i %end; ; run;
%mend gencode;

/* invoke the macro */ %gencode

The following SAS syntax is generated by the macro GENCODE where the highlighted code represents the values returned from the macro variables:

proc means data=sasuser.houses mean maxdec=0;
   title "final ordering sequence";
   var price sqfeet bedrooms baths;
run;

To better understand how this example works, examine the changes in the WORK.TEMP data set created by PROC MEANS before and after processing with PROC TRANSPOSE. This is the WORK.TEMP data set after the PROC MEANS step:

_TYPE_   _FREQ_   SQFEET   BEDROOMS   BATHS   PRICE

   0       15      1447       3         2     82720

Each numeric variable in the SASUSER.HOUSES data set has been summarized to a single mean value. The variables _TYPE_ and _FREQ_ are generated by default and are dropped when used with PROC TRANSPOSE. The variables and their values are transposed so that all the means are stored in a single variable that can be sorted. The following is the WORK.TEMP data set following the PROC TRANSPOSE step:

_NAME_      COL1
SQFEET      1447
BEDROOMS       3
BATHS          2
PRICE      82720

Now that the mean values are in the variable COL1 and the variable names are in a variable _NAME_, the means can be sorted in descending order using PROC SORT. The following is the WORK.TEMP data set following the PROC SORT step:

_NAME_       COL1
PRICE       82720
SQFEET       1447
BEDROOMS        3
BATHS           2

At this point the variable names are in order of the descending mean values. The next step is to create a unique macro variable for each observation in WORK.TEMP to store each _NAME_ value. The CALL SYMPUT routine accomplishes this. The process is made dynamic by concatenating the observation number to a prefix. Since there are four observations in the WORK.TEMP data set, this creates macro variables V1-V4. An additional macro variable, MAX, contains the number of macro variables created. The following table shows the macro variable names and their values, as created in the previous example:

VARIABLE   VALUE

   V1      PRICE
   V2      SQFEET
   V3      BEDROOMS
   V4      BATHS
   MAX     4

The macro variables are used by the macro GENCODE to generate the desired VAR statement. This is accomplished by incrementing the macro variable I from 1 to &MAX. Since &MAX resolves to 4, the %DO loop increments from 1 to 4. Within the %DO loop, the values of macro variables V1-V4 are resolved by first resolving the macro variable I to create the macro variable references V1 to V4. The macro variable references are then resolved to their corresponding values. Once the processing has completed, the VAR statement has been generated.

SQL Procedure Solution
PROC SQL provides an alternative solution that requires less syntax than the previous macro solution. The solution involves transposing the original data set so that each variable name becomes the value of _NAME_ and all numeric variable values become the values of COL1. This allows PROC SQL to generate all the desired statistics and then order the output by one of the statistics. The following example demonstrates this process:

   /* transpose SASUSER.HOUSES so that each variable      */
   /* name is the value of _NAME_ and each numeric value  */
   /* is the value COL1.  This is accomplished using the  */

   /* BY statement to transpose each observation          */
   /* individually.                                       */
proc transpose data=sasuser.houses
               out=temp2(keep=_name_ col1);

by notsorted street;
var _numeric_;

/* Generate the final ordering sequence by using */ /* PROC SQL to first calculate the desired statistics */ /* and then order by the descending values of the mean */ /* statistic */ options label;

proc sql;

select _name_ label='Variable', mean(col1) as mean label='Mean' format=12., std(col1) as std label='Std Dev' format=12.

from temp2
group _name_
order by mean desc;
run;

The following is a partial listing of the transposed data set:

_NAME_          COL1
SQFEET        1250.0
BEDROOMS         2.0
BATHS            1.0
PRICE        64000.0
SQFEET        1190.0
BEDROOMS         1.0
BATHS            1.0
PRICE        65850.0
SQFEET        1400.0
BEDROOMS         2.0
BATHS            1.5
PRICE        80050.0
SQFEET        1810.0
BEDROOMS         4.0
BATHS            3.0
PRICE       107250.0

Once the data set is transposed, PROC SQL is used to generate the final output. The statistics are generated on COL1 for each group of _NAME_. Once the statistics are generated, the results are ordered by the descending values of the
mean statistic.

For more information on the macro facility see the SAS Guide to Macro Processing, Version 6, Second Edition.

For more information on PROC SQL, see the SAS Guide to the SQL Procedure, Usage and Reference, Version 6, First Edition. For more information on the MEANS and TRANSPOSE procedures, consult the SAS Procedures Guide, Version 6, Third Edition.

For additional information on the use of PROC TRANSPOSE, refer to the article Rediscovering the TRANSPOSE Procedure in Observations, Second Quarter, 1993. Answer provided by Jason Sharpe. Jason is a technical support analyst specializing in base procedures and the macro facility. Copyright (r) 1993 by SAS Institute Inc.

SAS, SAS/ACCESS, SAS/AF, SAS/EIS, SAS/FSP, and Observations are registered trademarks or trademarks of SAS Institute Inc., in the USA and other countries. DB/2, IBM, and OS/2 are registered trademarks or trademarks of International Business Machines Corporation. (r) indicates USA registration.

Other brand and product names are registered trademarks or trademarks of their respective companies.