• Print  |
  • Feedback  |

Knowledge Base


TS-407

Ordering Data Values: Procedure Defaults and Options

by Jason R. Sharpe

Jason R. Sharpe is a technical support analyst at SAS Institute. His areas of expertise include base SAS¨ software procedures and the macro facility. Jason has BS degrees in electrical engineering and computer engineering from North Carolina State University.

Abstract
Controlling the order of data values often makes reports easier to read and understand. This is the first of a three-part series about ordering data values. The goal of this article is to outline the similarities and differences in the way some base SAS procedures order data values. Parts 2 and 3, which will appear in upcoming issues of Observations¨, will discuss techniques to achieve ordering schemes not attainable with procedure defaults or options.

Introduction
One of the most challenging parts of an application can be displaying a report with items occurring in a specific order. For example, you may want to display categories by the descending values of a dependent variable, or possibly in a customized sequence that is neither ascending nor descending.
Fortunately, the techniques for achieving these ordering schemes are not as varied as the ordering schemes themselves. You can control most ordering schemes by simply specifying an option or a combination of options. For other schemes you may need to create formats or manipulate the data values. Knowing what tools are available and how they can be used enables you to display data values in practically any desired order. To fully understand the techniques that I will describe in Parts 2 and 3, it is important to first understand the procedure defaults and options that are available. Some procedures offer an option, ORDER=, to control the ordering scheme. You should try ORDER= and the other available procedure options first, before you resort to using formats or data manipulation to achieve the desired results.
This article primarily deals with controlling the order of data values in the CHART, FREQ, MEANS, REPORT, SUMMARY, and TABULATE procedures. However, information contained in this article can be useful for other procedures as well. Some specific procedures and options that deal with ordering but are not covered in this article are the base SAS procedures SQL and TRANTAB, and the ORDER= option for SAS/GRAPH¨ procedures.
This article assumes you have at least an intermediate knowledge of the DATA step and the following base SAS procedures: CHART, FORMAT, FREQ, MEANS, REPORT, SORT, SUMMARY, and TABULATE. If you encounter syntax that is unfamiliar to you, please consult the appropriate documentation before continuing. Most examples will use the SASUSER.HOUSES data set that is distributed with SAS/ASSIST¨ software. The SASUSER.HOUSES data set and the system options used in the examples, appear in the appendix.

Procedure Defaults and Options
Even if you do not specifically request an ordering scheme for your report, one is applied. For example, when you produce a chart with PROC CHART, the midpoints are listed in ascending order by the internal values of the chart variable as long as no formats or ordering options are applied. This is the default behavior of PROC CHART. On the surface, the default may appear simple to understand. However, when additional elements are added to the step, the results may not be what you expect. Fortunately, most procedures have similar defaults. Even though the output of PROC CHART looks very different from the output of PROC TABULATE, the ordering schemes are similar. This is also true for the FREQ, MEANS, REPORT, and SUMMARY procedures. The conditions that can affect a procedures ordering scheme are:

  • host-specific collating sequences
  • the BY statement
  • a single classification variable
  • multiple classification variables
  • formats
  • the ORDER= option
  • supplemental ordering options

Understanding how each one individually affects your output makes it easier to understand their effect when you combine one or more of the conditions.

Host-Specific Collating Sequences
The most fundamental part of an ordering scheme is the collating sequence, which determines what character comes before another ("1" before "A" or "A" before "1", for example). You should familiarize yourself with the collating sequence that your operating system uses. MVS uses the EBCDIC collating sequence, but UNIX uses the ASCII collating sequence. Thus, data values could be ordered differently depending on the platform or operating system you are running on. The following example demonstrates the difference in ordering data values with an EBCDIC versus ASCII collating sequence:

data sample;

   input x $1.;
   cards;
1
a
A
z
Z
\
;


   /* OUTPUT 1a-1b      */
proc freq;

   tables x / nopercent nocum;
run;

EBCDIC                 ASCII


              X   Frequency         X   Frequency
              -------------         -------------
              a          1          1          1
              z          1          A          1
              A          1          Z          1
              \          1          \          1
              Z          1          a          1
              1          1          z          1

Output 1a              Output 1b

Output 1a-1b -- Demonstration of Host-Specific Collating SequencesOutput 1 shows the differences in the ordering scheme for the output produced using an EBCDIC collating sequence as opposed to an ASCII collating sequence. EBCDIC and ASCII are two of the

The BY Statement
Many SAS procedures support the BY statement. The BY statement tells the procedure how the data are sorted, which subsequently controls the order in which groups of data are read into a procedure. The effect is similar to creating a subset of your data for each BY group. The BY statement, when used with a procedure other than SORT, indicates how the data were sorted before being read into the procedure. The following example demonstrates this:

proc sort data=sasuser.houses out=houses;

   by style;


/* OUTPUT 2  */
proc freq data=houses;

   by style;
   tables bedrooms / nopercent;
run;
------------------- STYLE=CONDO ---------------------

                                   Cumulative
             BEDROOMS   Frequency   Frequency
             --------------------------------
                    2          2           2
                    3          1           3
                    4          1           4

     ------------------- STYLE=RANCH ---------------------

                                   Cumulative
             BEDROOMS   Frequency   Frequency
             --------------------------------
                    1          1           1
                    2          1           2
                    3          2           4
Output 2        Partial Listing of the BY StatementÕs Effect on Ordering

A separate table is generated for each value of STYLE. Thus, each value of STYLE is a BY group. The tables are ordered by the ascending alphabetic values of STYLE.

The SORT procedure has a DESCENDING option that you can use to sort the data in descending order. It also has other options such as ASCII or EBCDIC that you can use to control the collating sequence.

The BY statement has the options DESCENDING and NOTSORTED. If you have used PROC SORT with the DESCENDING option, then you must use the DESCENDING option in a BY statement in any subsequent procedure step. The NOTSORTED option is useful when you want to list a variable in a BY statement whose values are grouped together, but not sorted in either ascending or descending order, or when you use a collat

For the remainder of this article, the term BY group will denote that a BY statement is being used. For the most part, a BY group will be like a sub-data set in itself. In certain situations, this allows ordering schemes to vary from one BY group to the ne

A Single Classification Variable
Classification variables organize data into groups. These variables are different from variables in a BY statement because their values are grouped and ordered after all the data values in the entire data set or BY group have been read by the procedure. Th

Procedure       Statement/Option
CHART           HBAR, VBAR, PIE, BLOCK, STAR statements
                GROUP=, SUBGROUP= options
FREQ            TABLES statement
MEANS           CLASS statement
REPORT          DEFINE statement with GROUP, ORDER, or ACROSS option
SUMMARY         CLASS statement
TABULATE        CLASS statement

Table 1 Procedures that Accept Defining Classification Variables

For most procedures, with the exception of PROC REPORT, the default ordering scheme for a single class variable with no formats or ordering options applied is ascending. The following example demonstrates this default action:

/* OUTPUT 3  */
proc means data=sasuser.houses nway mean;

   class style;
   var sqfeet;
run;

Default behavior with PROC REPORT will be discussed later in the section "ORDER=FORMATTED".

               Analysis Variable : SQFEET
               STYLE     N Obs          Mean
               -----------------------------
               CONDO         4       1688.75
               RANCH         4       1251.25
               SPLIT         3       1370.00
               TWOSTORY      4       1458.75
               -----------------------------

Output 3 Default Ordering For A Single Classification Variable

Multiple Classification Variables
Multiple classification variables can be used to subgroup data in a hierarchical structure. When this occurs, only one variable can be considered the highest-order grouping variable. The remaining variables form subgroups of the immediately preceding higher-order variable. Before studying the default ordering scheme, you should know how to distinguish the order of precedence between the classification variables that will be used.

Notice in the following table that precedence in PROC REPORT is set by the order in which the variables appear in the COLUMN statement, and not by the order of the DEFINE statements. Also notice that in the PROC TABULATE syntax, the order of precedence is set by the order in which the variables appear on the TABLE statement, not the CLASS statement. Table 2 summarizes variable precedence with the highest order class variable being A:

Procedure   Syntax example

CHART       proc chart;
               vbar A / group=B subgroup=C;
FREQ        proc freq;
               tables A * B * C * D;
MEANS       proc means;
               class A B C D;
REPORT      proc report;
               column A B C D;
               define C / group;
               define A / group;
               define B / group;
               define D / group;
SUMMARY     proc summary;
               class A B C D;
TABULATE    proc tabulate;
               class D C B A;
               table A,B, C * D;

Table 2 Distinguishing Class Variables and Precedence

The overall ordering scheme when multiple class variables are used is determined first by developing a master order for each class variable for the entire data set or BY group, and then by applying the master order to each subgroup of the hierarchy. This means that the order does not change from one class subgroup to the next.

In the following example, the master order for STYLE and BEDROOMS is determined separately. STYLE is the highest- order class variable. Thus, BEDROOMS will form subgroups of each value of STYLE. The order for BEDROOMS is not determined again for each value of STYLE. Instead, the order is taken from the master order determined before generating any subgroups. Table 3 summarizes the master ordering scheme for the variables STYLE and BEDROOMS when using ORDER=FREQ with PROC TABULATE:

STYLE   BEDROOMS
RANCH   2
CONDO   4
TWOSTORY        3
SPLIT   1

Table 3 Master Ordering Schemes for STYLE and BEDROOMS

The master ordering schemes are developed independently for STYLE and BEDROOMS. The schemes in Table 3 were developed using ORDER=FREQ, which is discussed in detail in the section ORDER=FREQ. The following example demonstrates the use of the above master ordering schemes when STYLE and BEDROOMS are used simultaneously:

/* OUTPUT 4  */
proc tabulate data=sasuser.houses format=3. noseps

              order=freq;
   class style bedrooms;
   table style*bedrooms, n / rts=23;
run;

           |                            | N |
           |----------------------------+---|
           |STYLE         BEDROOMS      |   |
           |RANCH         2             |  1|
           |              3             |  2|
           |              1             |  1|
           |CONDO         2             |  2|
           |              4             |  1|
           |              3             |  1|
           |TWOSTORY      2             |  2|
           |              4             |  2|
           |SPLIT         4             |  1|
           |              3             |  1|
           |              1             |  1|
           ----------------------------------
Output 4        Ordering Multiple Classification Variables

Although it may not be apparent, STYLE and BEDROOMS are ordered based on the master ordering schemes in Table 3. If you repeat this example using the PRINTMISS option in the TABLE statement in PROC TABULATE, the master ordering scheme is apparent:

/* OUTPUT 5  */
proc tabulate data=sasuser.houses format=3. noseps

              order=freq;
   class style bedrooms;
   table style*bedrooms, n / rts=23 printmiss;
run;

             |                     | N |
             |---------------------+---|
             |STYLE      BEDROOMS  |   |
             |RANCH      2         |  1|
             |           4         |  .|
             |           3         |  2|
             |           1         |  1|
             |CONDO      2         |  2|
             |           4         |  1|
             |           3         |  1|
             |           1         |  .|
             |TWOSTORY   2         |  2|
             |           4         |  2|
             |           3         |  .|
             |           1         |  .|
             |SPLIT      2         |  .|
             |           4         |  1|
             |           3         |  1|
             |           1         |  1|
             ---------------------------
Output 5        Verifying Master Ordering Sequences

The order for each class variable, regardless of whether it is in a subgroup, is based on the master order for that variable.

Formats
With formats you can make class levels appear or not appear in a sequential order in your reports. The default ordering scheme for a variable with an applied format is the same for most procedures. The exceptions are PROC CHART and PROC REPORT.

For the FREQ, MEANS, SUMMARY, and TABULATE procedures the default is to list the values of the class variables in ascending order by their internal values, not their formatted values. Internal, or unformatted, values are the actual values that are stored in the SAS data set.

The default for PROC REPORT in Version 6 of the SAS System is to list the values of the classification variables by their formatted values. Please note this may change in a future release to be consistent with other procedures that offer the ORDER= option. This special case is discussed in more detail in the section ORDER=FORMATTED.

PROC CHART behaves differently than the other procedures that use classification variables. The PROC CHART ordering scheme when formats are applied depends on whether the classification variable is character or numeric. Table 4 summarizes the default ordering schemes for the base SAS procedures when formats are applied to the classification variables:

PROCEDURE  VARIABLE   FORMAT  ORDERED BY

CHART      Numeric    No      Internal value
                      Yes     Internal value
           Character  No      Internal value
                      Yes     Formatted value
FREQ       Num or Chr Y or N  Internal value
MEANS      Num or Chr Y or N  Internal value
REPORT     Numeric    No      Formatted value* **
                      Yes     Formatted value**
           Character  No      Internal value
                      Yes     Formatted value**
SUMMARY    Num or Chr Y or N  Internal value
TABULATE   Num or Chr Y or N  Internal value


* If no format is assigned, PROC REPORT assumes a BESTw.d format.
** The default may change to INTERNAL in a future release of the SAS System.

Table 4 Default Ordering Algorithms When Formats Are Applied

Lowest Internal Values
One common misunderstanding associated with formats occurs when they are used to group multiple values of a class variable. When this occurs, the internal value that applies to a particular class level is the lowest internal value that was encountered in the data for that format range, not the lowest range value specified in the VALUE statement used to create the format. This is true whether you are producing printed output or an output data set. The following example demonstrates this:

proc format;

   value numf 0,3,4='GROUP A' 1,2='GROUP B';


/* OUTPUT 6  */
proc report data=sasuser.houses nowd headskip;

   column bedrooms;
   define bedrooms / group format=numf. order=internal;
run;
BEDROOMS
GROUP B
GROUP A

Output 6        Ordering by the Lowest Internal Value Encountered

GROUP B appears before GROUP A because the lowest internal value encountered for GROUP B was 1, and for GROUP A was 3. The lowest possible internal value that could be in GROUP A is 0, but it did not exist in the data.

Another difficult situation with lowest internal values occurs when a format contains groups or ranges that are independent from one another. Consider the following example:

data sample;

   length dept $ 5;
   input dept id;
   cards;
PET   100
PET   110
PET   120
PET   199
PLANT 200
PLANT 210
PLANT 220
PLANT 299
;

proc format;
   value idfmt
         100='CAT'    110='DOG' 120='FISH' 199='OTHER'
         200='CACTUS' 210='IVY' 220='FERN' 299='OTHER';

/* OUTPUT 7  */
proc tabulate data=sample noseps;

   class dept id;
   table dept*id, n;
   format id idfmt.;
run;

               |              |     N      |
               |--------------+------------|
               |DEPT   ID     |            |
               |PET    CAT    |        1.00|
               |       DOG    |        1.00|
               |       FISH   |        1.00|
               |       OTHER  |        1.00|
               |PLANT  OTHER  |        1.00|
               |       CACTUS |        1.00|
               |       IVY    |        1.00|
               |       FERN   |        1.00|
               -----------------------------
Output 7        Independent Groups of Format Ranges in a Single Format

Notice in Output 7 that for DEPT=PET, the value OTHER appears last in the sequence; it appears first for DEPT=PLANT. This is because the master ordering sequence for ID is determined before subgroups are created. Since ID=199 and ID=299 have the same forma

Missing Values
Missing values can also affect formatted ordering behavior. If missing values exist in the data and in a format range, they can have one of two possible effects on the output: they can cause the entire format group to be treated as invalid, or they can cause the entire format range to have the lowest internal value (possibly resulting in the format range appearing first in an ordering scheme). The particular effect missing values can have depends on the procedure and options that you specify.

Most procedures that use class variables provide the MISSING option, which enables you to specify whether missing values are to be considered valid class levels. PROC FREQ has an additional option, MISSPRINT, which displays missing class levels but does not use them in calculating statistics.

The timing in which a procedure classifies missing values as valid or invalid can also yield different results. Formats can group missing class levels with nonmissing class levels, causing nonmissing class levels to be treated as missing class levels. The MEANS, REPORT, SUMMARY, and TABULATE procedures classify missing values as valid or invalid before applying formats. Thus, if you do not specify MISSING, and you use a format range that groups nonmissing class levels with missing class levels, then the nonmissing class levels will still be treated as valid, whereas the missing class levels will not. PROC FREQ, on the other hand, applies formats before classifying missing values as valid or invalid. Thus, if MISSING or MISSPRINT is not used and a format range would group nonmissing class levels with missing class levels, both the nonmissing and missing class levels would be considered invalid. The following example demonstrates the difference in effect between PROC FREQ and PROC REPORT:

proc format;
   value bedfmt 1='ONE' 2='TWO' other='OTHER';

data houses;

   set sasuser.houses end=last;
   output;
   if last then do;

      bedrooms=.;
      output;

   end;
   format bedrooms bedfmt.;


   /* OUTPUT 8a */
proc print data=houses;

   title "WORK.HOUSES";
   var bedrooms;
   format bedrooms;


   /* OUTPUT 8b */
proc freq data=houses;

   title1 "PROC FREQ";
   title2 "Without MISSING Specified";
   tables bedrooms / nocum nopercent;
run;


   /* OUTPUT 8c */
proc report data=houses nowd headline;

   title1 "PROC REPORT";
   title2 "Without MISSING Specified";
   column bedrooms n;
   define bedrooms / group width=8;
run;

            WORK.HOUSES
          OBS    BEDROOMS
            1        2
            2        1
            3        2
            4        4
            5        3
            6        4
            7        3
            8        3
            9        2
           10        4
           11        3
           12        2
           13        1
           14        4
           15        2
           16        .
Output 8a       WORK.HOUSES Data Set


Output 8b-8c    Difference in Timing for Evaluating Missing Values


     PROC FREQ                     PROC REPORT
   Without MISSING Specified      Without MISSING Specified

     BEDROOMS   Frequency            BEDROOMS          N
     --------------------            -------------------
     ONE               2                  ONE          2
     TWO               5                OTHER          8
                                          TWO          5


    Frequency Missing = 9


Output 8b                      Output 8c

Notice that PROC FREQ does not include the class level OTHER, whereas PROC REPORT does. This is because PROC FREQ applies the BEDFMT format to BEDROOMS before classifying missing values as invalid. PROC REPORT, on the other hand, classifies missing values as invalid before applying the format, allowing the nonmissing class levels that would normally be grouped with the missing class levels to still be treated as valid. You can verify this by observing the frequency counts. A total of 16 observations reside in the WORK.HOUSES data set. PROC FREQ reports nine invalid class levels, whereas PROC REPORT treats only one class level as invalid. If the lowest internal value in a format group is a missing value, then the entire group will be treated as missing. Because missing values are considered the lowest possible internal value for either numeric or character variables, missing values cause an entire format group to have the lowest internal; thus, missing values rank first when ordering by internal values. To demonstrate this point, add the MISSING option to the previous example:

/* OUTPUT 9a */
proc freq data=houses;

title1 "PROC FREQ";
title2 "With MISSING Specified";
tables bedrooms / nocum nopercent missing ; run;

/* OUTPUT 9b */
proc report data=houses nowd headline missing;

title1 "PROC REPORT";
title2 "With MISSING Specified";
column bedrooms n;
define bedrooms / group width=8;
run;

I      PROC FREQ                      PROC REPORT
    With MISSING Specified          With MISSING Specified

     BEDROOMS   Frequency            BEDROOMS          N
     --------------------            -------------------
     OTHER             9                  ONE          2
     ONE               2                OTHER          9
     TWO               5                  TWO          5

Output 9a                        Output 9b
Output 9a-9b Effect of Missing Values on Ordering Format Groups

In the results for PROC FREQ, notice that the group OTHER is ordered first in the sequence; it appears second in the PROC REPORT results. By default, PROC FREQ orders by internal values, and PROC REPORT orders by formatted values. Also notice the frequency counts for OTHER are the same in both procedures. In PROC FREQ it corresponds to the missing frequency in Output 8. For PROC REPORT, the frequency increases by one to account for the single missing value that was not included in Output 8.

BY Variables
If a format is applied to a variable in the BY statement, then the values of the BY variable will be grouped unless the internal values are not sequential in the format ranges. The following example demonstrates:

options formdlim=' ';

proc sort data=sasuser.houses out=houses;

by bedrooms;

proc format;

value numf 3='GROUP A' 1,2,4='GROUP B';

/* OUTPUT 10 */
proc report data=houses nowd headskip;

by bedrooms;
format bedrooms numf.;
column price;
run;
------------------- BEDROOMS=GROUP B -----------------------

                              PRICE
                           $480,250
BEDROOMS=GROUP A ----------------------- PRICE $328,750 BEDROOMS=GROUP B ----------------------- PRICE $431,800 Output 10 Ordering with a Formatted BY Variable

Notice that even though the values of 1, 2, and 4 are in the same format range, they are not grouped together because the values in the range do not represent a sequence. Thus, three BY groups are created rather than two.

The ORDER= Option
The ORDER= option enables you to choose the ordering scheme the procedure uses. It is available with the FREQ, MEANS, REPORT, SUMMARY, and TABULATE procedures. The SAS/STAT procedures CATMOD, GLM, LIFEREG, LOGISTIC, and PROBIT also support the ORDER= option. This section demonstrates each value of the ORDER= option: INTERNAL, FORMATTED, DATA, and FREQ. The following table summarizes the default settings for this option, for each base SAS procedure that this article discusses:

Procedure Default

CHART      Not Available
FREQ       INTERNAL
MEANS      INTERNAL
REPORT     FORMATTED*
SUMMARY    INTERNAL
TABULATE   INTERNAL

*  Please note that due to the variation of this default, it may be changed to
ORDER=INTERNAL in a future release of the SAS System.

Table 5 Defaults for the ORDER= Option

ORDER=INTERNAL
ORDER=INTERNAL with no other ordering options specified causes class-variable values to be listed in ascending order by their internal, or unformatted, values. The following example demonstrates this:

proc format;

value numf 1='ONE' 2='TWO' 3='THREE' 4='FOUR';

/* OUTPUT 11 */
proc report data=sasuser.houses nowd headskip;

column bedrooms;
define bedrooms / group format=numf8. order=internal; run;

BEDROOMS

                         ONE
                         TWO
                       THREE
                        FOUR
Output 11       Demonstration of ORDER=INTERNAL

Notice that the values for bedroom are listed as ONE, TWO, THREE, FOUR, corresponding to the internal order of 1, 2, 3, 4. If you donÕt specify ORDER=INTERNAL, the values would be listed in alphabetical order by the formatted values: FOUR, ONE, THREE, TWO.

ORDER=FORMATTED
ORDER=FORMATTED with no other ordering options specified behaves like ORDER=INTERNAL, except that it orders by the formatted values rather than the internal values (if a format has been specified). The following example demonstrates this:

proc format;

value numf 1='ONE' 2='TWO' 3='THREE' 4='FOUR';

/* OUTPUT 12 */
proc freq data=sasuser.houses order=formatted;

tables bedrooms / nopercent;
format bedrooms numf.;
run;

                                 Cumulative
           BEDROOMS   Frequency   Frequency
           --------------------------------
           FOUR              4           4
           ONE               2           6
           THREE             4          10
           TWO               5          15

Output 12       Demonstration of ORDER=FORMATTED

Notice that the values for BEDROOMS are ordered as FOUR, ONE, THREE, TWO (alphabetical order of the formatted values). If ORDER=FORMATTED is used and a class variable does not have a format associated with it, then the classification variable will be ordered by its internal values.

Special Case: PROC REPORT--As mentioned previously, ORDER=FORMATTED is the default setting for PROC REPORT. This setting can pose problems because PROC REPORT uses a default format of BESTw. for numeric variables with unspecified formats. The following example demonstrates what can happen:

/* OUTPUT 13a */
proc report data=sasuser.houses nowd headskip;

title1 "ORDER=FORMATTED";
title2 "(default)";
title4 "FORMAT=BEST9.";
title5 "(default)";
column baths;
define baths / group width=9;

/* OUTPUT 13b */
proc report data=sasuser.houses nowd headskip;

title1 "ORDER=FORMATTED";
title2 "(default)";
title4 "FORMAT=3.1";
title5 "(specified)";
column baths;
define baths / group width=9 format=3.1;

/* OUTPUT 13c */
proc report data=sasuser.houses nowd headskip;

title1 "ORDER =INTERNAL";
title2 "(specified)";
title4 "FORMAT=BEST9.";
title5 "(default)";
column baths;
define baths / group width=9 order=internal; run;

    ORDER=FORMATTED     ORDER=FORMATTED     ORDER =INTERNAL
       (default)           (default)          (specified)

     FORMAT=BEST9.          FORMAT=3.1       FORMAT=BEST9.
       (default)          (specified)          (default)

           BATHS               BATHS               BATHS

               1                 1.0                   1
               2                 1.5                 1.5
               3                 2.0                   2
             1.5                 2.5                 2.5
             2.5                 3.0                   3

        Output 13a      Output 13b      Output 13c

Output 13a-13c Demonstration of PROC REPORT Defaults

Notice that in Output 13a, the default format is BEST9. If a format is not specified for a numeric variable, then a BESTw. format is applied. Since the default setting for PROC REPORT is ORDER=FORMATTED, the values will be sorted using the formatted values. This can be complicated by character comparisons that sometimes give misleading results. In Output 13a the values actually being compared are " 1", " 2", " 3", "1.5", and "2.5". Since a blank character sorts before a number or a period (.), the values " 1", " 2", and " 3" sort before the values 1.5 or 2.5. This problem is corrected in Output 13b because there are no leading blanks in the comparisons. This problem is also corrected in Output 13c because the internal values, not the formatted values, are used to determine the order.

ORDER=DATA
ORDER=DATA specifies that the order is set according to how the data are initially read in by the procedure. This setting can be complicated depending on the use of BY statements or multiple classification variables. The following example demonstrates a simple case:

/* OUTPUT 14 */
proc tabulate data=sasuser.houses order=data

format=3. noseps;
class style;
table style, n;
run;

                   |                 | N |
                   |-----------------+---|
                   |STYLE            |   |
                   |RANCH            |  4|
                   |SPLIT            |  3|
                   |CONDO            |  4|
                   |TWOSTORY         |  4|
                   -----------------------
Output 14       Simple Demonstration of ORDER=DATA

Notice that the order for STYLE is neither ascending nor descending. STYLE=RANCH appears first because it was the first value encountered in the input data set; STYLE=SPLIT appears second because it was encountered second, and so on. If you use a BY statement, the order is reset at the beginning of each new BY group, as if a new data set were being processed. When you use multiple classification variables, the order is determined independently for each classification variable across the entire data set or BY group. The ordering scheme is then developed first using the master order of the highest order variable, then the next highest, and so on. The following example demonstrates this point:

/* OUTPUT 15 */
proc tabulate data=sasuser.houses order=data

format=3. noseps;
class style bedrooms;
table style*bedrooms, n;
run;

                   |                 | N |
                   |-----------------+---|
                   |STYLE    BEDROOMS|   |
                   |RANCH            |   |
                   |         2       |  1|
                   |         1       |  1|
                   |         3       |  2|
                   |SPLIT    1       |  1|
                   |         4       |  1|
                   |         3       |  1|
                   |CONDO    2       |  2|
                   |         4       |  1|
                   |         3       |  1|
                   |TWOSTORY 2       |  2|
                   |         4       |  2|
                   -----------------------

Output 15     ORDER=DATA with Multiple

Classification Variables

Notice that the order for STYLE is RANCH, SPLIT, CONDO, TWOSTORY. The order for BEDROOMS is 2, 1, 4, 3. This is not clearly apparent in the output because no value of STYLE has all four values of BEDROOMS. The order of values is established independently for each variable, and not according to the subgroups. You can verify this by adding the PRINTMISS option to the TABLE statement, or by comparing the order of STYLE and BEDROOMS in the SASUSER.HOUSES data set (listed in the appendix) with Output 15.

ORDER=FREQ
ORDER=FREQ specifies that classification variables are to be ordered by the frequencies of each of their values. With the exception of PROC FREQ, missing class levels are ordered by their frequency counts just as nonmissing class levels are. PROC FREQ, however, always lists missing class levels first, regardless of their frequency counts. All base SAS procedures, except PROC REPORT, list the frequencies in descending order. PROC REPORT, by default, lists the frequencies in ascending order. PROC REPORT has an additional option, DESCENDING. If you use DESCENDING in conjunction with ORDER=FREQ, then the class levels will be ordered by their descending frequency counts.

If two class levels have the same frequency, a secondary ordering algorithm will be used. All base SAS procedures, except PROC FREQ, use ORDER=DATA as a secondary ordering method. If duplicate frequency counts occur with PROC FREQ when using ORDER=FREQ, then PROC FREQ will use the ORDER=FORMATTED method as a secondary ordering method. Furthermore for PROC FREQ, if a format has not been applied, then the tie will be broken using the ORDER=INTERNAL method. If you use PROC REPORT with ORDER=FREQ and DESCENDING, and a tie occurs, then the ORDER=DATA method will be used, but the levels will be listed in reverse order of occurrence in the data. The following table summarizes the behavior across the base SAS procedures that use the ORDER= option:

Procedure Direction       Format  Event of Tie
FREQ      Descending      No      ORDER=INTERNAL
                          Yes     ORDER=FORMATTED
MEANS     Descending      Y or N  ORDER=DATA
REPORT    Ascending       Y or N  ORDER=DATA
          Descending      Y or N  ORDER=DATA*
SUMMARY   Descending      Y or N  ORDER=DATA
TABULATE  Descending      Y or N  ORDER=DATA

*       If you specify DESCENDING, then both the primary method, ORDER=FREQ, and
the secondary method, ORDER=DATA, list levels in descending order.


Table 6 Secondary Ordering Algorithms  for ORDER=FREQ

The following series of examples demonstrates the differences in behavior when you use ORDER=FREQ:

proc format;

value bedfmt 1='ONE' 2='TWO' 3='THREE' 4='FOUR';

/* OUTPUT 16a */
proc tabulate data=sasuser.houses

order=freq noseps format=3.; title1 "PROC TABULATE";
title2 "Without Format";
class bedrooms;
table bedrooms, n;

/* OUTPUT 16b */
proc freq data=sasuser.houses order=freq;

title1 "PROC FREQ";
title2 "Without Format";
tables bedrooms / nocum nopercent;

/* OUTPUT 16c */
proc report data=sasuser.houses nowd headline; title1 "PROC REPORT";
title2 "Without Format";
title3 "Without DESCENDING";
column bedrooms n;
define bedrooms / group order=freq;

/* OUTPUT 16d */
proc report data=sasuser.houses nowd headline;

title1 "PROC REPORT";
title2 "Without Format";
title3 "With DESCENDING";
column bedrooms n;
define bedrooms / group order=freq descending;

/* OUTPUT 16e */
proc tabulate data=sasuser.houses

order=freq noseps format=3.; title1 "PROC TABULATE";
title2 "With Format";
class bedrooms;
table bedrooms, n;
format bedrooms bedfmt.;

/* OUTPUT 16f */
proc freq data=sasuser.houses order=freq;

title1 "PROC FREQ";
title2 "With Format";
tables bedrooms / nocum nopercent;
format bedrooms bedfmt.;

/* OUTPUT 16g */
proc report data=sasuser.houses nowd headline;

title1 "PROC REPORT";
title2 "With Format";
title3 "Without DESCENDING";
column bedrooms n;
define bedrooms / group order=freq format=bedfmt8.;

/* OUTPUT 16h */
proc report data=sasuser.houses nowd headline;

title1 "PROC REPORT";
title2 "With Format";
title3 "With DESCENDING";
column bedrooms n;
define bedrooms / group order=freq

format=bedfmt8. descending; run;

You should note two main differences in Example 16. First, PROC TABULATE and PROC FREQ list the data values in descending order of the frequency counts. PROC REPORT requires you to specify the DEFINE statement option, DESCENDING. Second, PROC TABULATE and PROC REPORT use the ORDER=DATA method to handle tied values, whereas PROC FREQ uses ORDER=FORMATTED. If you donÕt specify a format, PROC FREQ uses ORDER=INTERNAL. Now, look at what happens in Output 16h with PROC REPORT when DESCENDING is applied. Not only are data values listed in descending order of frequency counts, but the data values that had the same frequency are listed in reverse order of occurrence. In other words, ORDER=DATA is used to handle ties, but the data values are listed in reverse order.

As with ORDER=DATA, when you add a BY statement, the order for classification variables is established as though each BY group were a separate data set. When you use multiple classification variables with ORDER=FREQ, the order is determined independently for each classification variable. The overall ordering scheme is then applied first by using the order of the highest order variable, then the next-highest order variable, and so on. Now, consider the frequencies for BATHS:

proc tabulate data=sasuser.houses

order=freq format=3. noseps; class baths;
table baths, n;
run;

If you make BEDROOMS the highest order variable and BATHS the next-highest order variable, you expect the rows to be ordered by the descending frequency of values first in BEDROOMS, then in BATHS. The following example demonstrates this:

proc tabulate data=sasuser.houses

order=freq format=3. noseps;
class baths bedrooms;
table bedrooms*baths, n;
run;

Notice that the N statistic does not list the frequencies of the combinations in descending order. Instead, the order is set first by the descending frequencies of BEDROOMS and then by BATHS. You can verify this by comparing the order of each variable against the orders used in Output 16a and Output 17. The master ordering sequence for BEDROOMS is 2, 4, 3, 1. The master ordering sequence for BATHS is 1, 3, 2.5, 1.5, 2.

            PROC TABULATE                  PROC FREQ

            Without Format               Without Format


         |              | N |
         |--------------+---|           BEDROOMS   Frequency
         |BEDROOMS      |   |           --------------------
         |2             |  5|                 2          5
         |4             |  4|                 3          4
         |3             |  4|                 4          4
         |1             |  2|                 1          2
         --------------------

         Output 16a                     Output 16b

-------------------------------------------------------------------------------

          PROC REPORT                      PROC REPORT

        Without Format                    Without Format

       Without DESCENDING                With DESCENDING


       BEDROOMS          N            BEDROOMS          N
       -------------------            -------------------
         1          2                   2          5
         4          4                   3          4
         3          4                   4          4
         2          5                   1          2

        Output 16c                     Output  16d



-------------------------------------------------------------------------------



             PROC TABULATE                      PROC FREQ

             With Format                    With Format



         |              | N |
         |--------------+---|           BEDROOMS   Frequency
         |BEDROOMS      |   |           --------------------
         |TWO           |  5|           TWO               5
         |FOUR          |  4|           FOUR              4
         |THREE         |  4|           THREE             4
         |ONE           |  2|           ONE               2
         --------------------

         Output 16e                     Output 16f

-------------------------------------------------------------------------------



             PROC REPORT                   PROC REPORT

            With Format                    With Format

         Without DESCENDING              With DESCENDING


        BEDROOMS          N            BEDROOMS          N
        -------------------            -------------------
           ONE          2                 TWO          5
          FOUR          4               THREE          4
         THREE          4                FOUR          4
           TWO          5                 ONE          2


            Output 16g                     Output      16h

         Output 16a-16h       Differences When Using ORDER=FREQ

Supplemental Ordering Options
In addition to the ORDER= option, some procedures have other supplemental ordering options. PROC REPORT, for example, has an additional ordering option, DESCENDING, which orders the classification variable in descending order by either its internal or formatted values. To use the internal values, specify ORDER=INTERNAL in addition to the option DESCENDING. Likewise, to use the formatted values, specify the option ORDER=FORMATTED with DESCENDING.

                   |                 | N |
                   |-----------------+---|
                   |BATHS            |   |
                   |1                |  5|
                   |3                |  4|
                   |2.5              |  3|
                   |1.5              |  2|
                   |2                |  1|
                   -----------------------

Output 17  Demonstration of ORDER=FREQ with BATHS

PROC CHART does not have an ORDER= option, but it does have the options DESCENDING and MIDPOINTS=. These can be used to affect the ordering of the classification variables. For more information on these supplemental ordering options, refer to the SAS Procedures Guide, Version 6, Third Edition; SAS Guide to the REPORT Procedure, Usage and Reference, Version 6, First Edition; and SAS Guide to TABULATE Processing, Second Edition.

                   |                 | N |
                   |-----------------+---|
                   |BEDROOMS BATHS   |   |
                   |2        1       |  3|
                   |         1.5     |  1|
                   |         2       |  1|
                   |4        3       |  2|
                   |         2.5     |  2|
                   |3        3       |  2|
                   |         2.5     |  1|
                   |         1.5     |  1|
                   |1        1       |  2|
                   -----------------------
Output 18       Demonstration of ORDER=FREQ with Multiple Classification Variables

Conclusion
Ordering values can bring additional meaning to your reports. Understanding the default ordering schemes and available options can help you create optimal results in your reports. However, in certain circumstances the procedure might not provide an ordering option that meets your needs. In such a case, you can use ordering techniques to manipulate the data, or formats to achieve the order that you want. These techniques will be discussed in subsequent issues of Observations.
SAS, SAS/ASSIST, SAS/GRAPH, SAS/STAT, and Observations are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. (r) indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.

Appendix
The following OPTIONS and TITLE statements are used with all examples in this article that do not explicitly contain other settings:

options nodate nonumber nolabel ls=64 ps=90; title;

The following is the SASUSER.HOUSES data set, which is used to supply the data for most examples in this article:

STYLE    SQFEET BEDROOMS BATHS STREET                  PRICE
RANCH     1250      2     1.0  Sheppard Avenue       $64,000
SPLIT     1190      1     1.0  Rand Street           $65,850
CONDO     1400      2     1.5  Market Street         $80,050
TWOSTORY  1810      4     3.0  Garris Street        $107,250
RANCH     1500      3     3.0  Kemble Avenue         $86,650
SPLIT     1615      4     3.0  West Drive            $94,450
SPLIT     1305      3     1.5  Graham Avenue         $73,650
CONDO     1390      3     2.5  Hampshire Avenue      $79,350
TWOSTORY  1040      2     1.0  Sanders Road          $55,850
CONDO     2105      4     2.5  Jeans Avenue         $127,150
RANCH     1535      3     3.0  State Highway         $89,100
TWOSTORY  1240      2     1.0  Fairbanks Circle      $69,250
RANCH      720      1     1.0  Nicholson Drive       $34,550
TWOSTORY  1745      4     2.5  Highland Road        $102,950
CONDO     1860      2     2.0  Arcata Avenue        $110,700