TS-DOC: TS-410 - Ordering Data Values: Double Mapping and Combining Techniques

TS-410

Ordering Data Values: Double Mapping and Combining Techniques

by Jason R. Sharpe
Jason R. Sharpe is a technical support analyst at SAS Institute. His areas of expertise include base SAS software procedures and the macro facility. Jason has BS degrees in electrical engineering and computer engineering from North Carolina State University.

Abstract
This is the final article in a three-part series about ordering data values. This article discusses double mapping and combining techniques to obtain complex ordering schemes. Combining techniques is the use of multiple techniques simultaneously. Double mapping uses ordering variables based on combinations of class variable values to obtain a specific ordering scheme.

Introduction
This article addresses controlling the order of data values in the REPORT and TABULATE procedures. However, information in this article can also be used with other procedures. This article assumes you have an advanced knowledge of PROC REPORT and PROC TABULATE, the FORMAT, SUMMARY, and SORT procedures, and the DATA step. You should also be familiar with the information in parts 1 and 2. Part 1, ÒOrdering Data Values: Procedure Defaults and OptionsÓ (Observations, Fourth Quarter 1993), discusses the similarities and differences between the ways procedures order data values. Part 2, ÒOrdering Data Values: Dummy Data and Mapping Techniques" (Observations, First Quarter 1994), demonstrates techniques for controlling the order of data values when procedure defaults and options are insufficient. The dummy data method involves concatenating a dummy data set in front of an original data set to force the desired order with the ORDER=DATA option. Mapping involves changing the data values to numbers that occur in the desired order and then using formats to display the original values. If you encounter syntax or terminology that is unfamiliar to you, consult the appropriate documentation or article before continuing. Most examples use the SASUSER.HOUSES data set that is distributed with SAS/ASSIST¨ software. A printout of the SASUSER.HOUSES data set and the system options used in the examples are presented in the Appendix.

Combining Techniques
In part 2 of this series, dummy data and mapping techniques are introduced, demonstrating how to obtain ordering schemes when procedure defaults and options are insufficient. Another approach to ordering data values is to use these techniques simultaneously to control the ordering. In other words, it might be easy to control the order for one variable using a dummy data set, but another variable would be more easily controlled with mapping. Because PROC REPORT allows you to specify a separate ORDER= setting for each class variable, you can use both dummy data and mapping simultaneously. Please note that this will not work with the FREQ, MEANS, SUMMARY, and TABULATE procedures because they do not allow separate ORDER= option specifications for each classification variable. An example that preserves the collating sequence from an ASCII to an EBCDIC platform demonstrates the combination of dummy data and mapping techniques.

Part 1 of this series discusses system specific collating sequences where data values may be ordered differently depending on the operating system or platform you use. Under the MVS operating system, an EBCDIC platform, a sorts before A, whereas in the UNIX environment, an ASCII platform, A is ordered before a. To circumvent this difference, you can use PROC SORT to create a dummy data set ordered by the collating sequence of your choice. Then, simply use PROC REPORT and select ORDER=DATA for the variable that needs the dummy data set. The following example demonstrates the different ordering schemes due to collating sequences:

data sample;

input code $1. date date7.;
cards;

A 01JAN93 /* Fri        */
a 06JAN93 /* Wed        */
a 11JAN93 /* Mon        */
a 18JAN93 /* Mon        */
A 22JAN93 /* Fri        */
A 26JAN93 /* Tue        */
a 26JAN93 /* Tue        */
A 04FEB93 /* Thu        */
A 08FEB93 /* Mon        */
A 08FEB93 /* Mon        */
a 12FEB93 /* Fri        */
a 19FEB93 /* Fri        */
A 22FEB93 /* Mon        */

;

/* OUTPUT 1a and 1b */
proc report data=sample nowindows headskip;

column code date n;
define code / group format=$4.;
define date / group format=weekdate3. order=internal

"DAY";
run;

Output 1a was generated on MVS, an EBCDIC platform, and Output 1b was generated on UNIX, an ASCII platform:

CODE DAY N CODE DAY N

a     Fri  2         A     Fri  2
      Wed  1               Mon  3
      Mon  2               Tue  1
      Tue  1               Thu  1
A     Fri  2         a     Fri  2
      Mon  3               Wed  1
      Tue  1               Mon  2
      Thu  1               Tue  1
 Output 1a            Output 1b

Output 1a-1b Default Ordering Sequence on MVS and UNIX

Notice that the ordering schemes are different. First, in Output 1a, a appears before A whereas in Output 1b, A comes before a. This variation is due to differences between collating sequences. For more information on collating sequences refer to Chapter 31, "The SORT Procedure," in the SAS Procedures Guide, Version 6, Third Edition.

Next, notice that the days of the week do not appear in standard calendar sequence in either output. This occurs because DATE is ordered by the lowest internal value encountered in each format group. For more information on this, refer to the section "Formats" in part 1 of this series.

The goal of this example is to order CODE with an EBCDIC collating sequence, regardless of platform, and to arrange DATE in a standard calendar sequence. This is accomplished using both a dummy data set and mapping techniques. The Mapping Process
First, generate a temporary data set with the additional variables DAY, LABEL, FMTNAME, and TYPE that are to be used for mapping. The data set WORK.SAMPLE is used as input.

data temp;

set sample;
retain fmtname 'dayfmt' type 'N';
day=weekday(date);
label=put(date,weekdate3.);
run;

The following is a listing of the data set WORK.TEMP:

WORK.TEMP

CODE DATE FMTNAME TYPE DAY LABEL

 A      -18627    dayfmt      N       6      Fri
 a      -18622    dayfmt      N       4      Wed
 a      -18617    dayfmt      N       2      Mon
 a      -18610    dayfmt      N       2      Mon
 A      -18606    dayfmt      N       6      Fri
 A      -18602    dayfmt      N       3      Tue
 a      -18602    dayfmt      N       3      Tue
 A      -18593    dayfmt      N       5      Thu
 A      -18589    dayfmt      N       2      Mon
 A      -18589    dayfmt      N       2      Mon
 a      -18585    dayfmt      N       6      Fri
 a      -18578    dayfmt      N       6      Fri
 A      -18575    dayfmt      N       2      Mon

Output Data Set

Next, sort WORK.TEMP using PROC SORT with the NODUPKEY option to ensure that there are no duplicate values of DAY.

proc sort

data=temp(keep=fmtname day label type) out=control nodupkey;
by day;
run;
The following is a listing of the WORK.CONTROL data set:

WORK.CONTROL

FMTNAME TYPE DAY LABEL

dayfmt      N       2      Mon
dayfmt      N       3      Tue
dayfmt      N       4      Wed
dayfmt      N       5      Thu
dayfmt      N       6      Fri

Output Data Set Once the duplicate values are removed, DAY is used as the START variable for a control data set, and a format is created using WORK.CONTROL as input to PROC FORMAT: proc format cntlin=control(rename=(day=start)); run;

The Dummy Data Process
To control the collating sequence for CODE, use the dummy data technique. To generate a dummy data set with the correct collating sequence, use PROC SORT with the appropriate collating sequence option, EBCDIC in this example. List CODE and DAY in the KEEP= option to ensure that only class variables are included in the dummy data set. This prevents the need to explicitly assign analysis variables to missing in the concatenating process. Specify NODUPKEY to reduce the size of the dummy data set.

proc sort

data=temp(keep=code day) out=dummy ebcdic nodupkey; by code day;
run;
The following is a listing of the WORK.DUMMY data set: WORK.DUMMY

CODE DAY

Output Data Set

Notice that a is before A, which agrees with Output 1a obtained with an EBCDIC collating sequence. Finally, concatenate WORK.DUMMY and WORK.SAMPLE to generate the final data set. Create the VALID variable to differentiate the dummy observations from the actual observations and maintain the correct frequency counts in the final report. Use DAY to obtain the desired calendar sequence for DATE. DAY is also part of the mapping process. The following code demonstrates:

data sample2;

set dummy sample(in=is_valid);
if is_valid then do;

      valid=1;
      day=weekday(date);

end;
drop date;
run;

The following are the WORK.DUMMY, WORK.SAMPLE, and WORK.SAMPLE2 data sets that illustrate the results of the concatenation. Note that in the WORK.SAMPLE data set DATE is displayed with the WEEKDATE3. format.

WORK.DUMMY WORK.SAMPLE WORK.SAMPLE2

CODE DAY CODE DATE CODE DAY VALID

a        2     A      Fri     a        2       .
a        3     a      Wed     a        3       .
a        4     a      Mon     a        4       .
a        6     a      Mon     a        6       .
A        2     A      Fri     A        2       .
A        3     A      Tue     A        3       .
A        5     a      Tue     A        5       .
A        6     A      Thu     A        6       .
               A      Mon     A        6       1
               A      Fri     a        4       1
               a      Fri     a        2       1
               A      Mon     a        2       1
                              A        6       1
                              A        3       1
                              a        3       1
                              A        5       1
                              A        2       1
                              A        2       1
                              a        6       1
                              a        6       1
                              A        2       1

Output Data Sets

The observations in WORK.SAMPLE2 where VALID is missing come from WORK.DUMMY. The variable DAY is nonmissing in these observations, ensuring that the dummy observations are not ignored because DAY is used as a GROUP variable in PROC REPORT. Replace the variable DATE with DAY, which is mapped to the correct day of the week to be displayed with the DAYFMT. format.

Generate the Final Report
Finally, use PROC REPORT to display the desired ordering scheme using WORK.SAMPLE2 as input. Define CODE with the ORDER=DATA option for the dummy data process, and define DAY with the ORDER=INTERNAL option, formatting it with the DAYFMT. format, as required for the mapping process.

/* OUTPUT 2 */
proc report data=sample2 nowindows headskip;

column code day valid;
define code / group order=data format=$4.; define day / group order=internal format=dayfmt4.;

   define valid / n                    format=1.
                  "N";

run;

The following is the final report with the values in the desired order:

CODE DAY N

a      Mon   2
       Tue   1
       Wed   1
       Fri   2
A      Mon   3
       Tue   1
       Thu   1
       Fri   2

Output 2        Desired Ordering Scheme Obtained by Combining Techniques

Notice that a is before A according to the EBCDIC collating sequence and that the values of DAY are listed in standard calendar sequence. Thus, the two separate ordering schemes are obtained by combining dummy data and mapping techniques.

Double Mapping
Double mapping is a process in which class values are replaced with numeric values corresponding to the desired position in the ordering scheme. The original values are then displayed using formats. Double mapping differs from the previously discussed single mapping in that double mapping uses values based on combinations of class variables, whereas single mapping uses values based only on a single class variable. Thus, values created from double mapping range from 1 to the maximum number of possible combinations of class values. For single mapping, the values range from 1 to the number of unique values for the variable being mapped.

When Is Double Mapping Necessary?
Double mapping is necessary when the desired ordering scheme requires that the order for a lower-order class variable vary within the subgroups of the higher-order class variable. Consider the following example: /* OUTPUT 3 */
proc tabulate data=sasuser.houses noseps

format=3. order=freq;
class bedrooms style;
table bedrooms*(style=' ' all), n / rts=20; run;

Initial Order Desired Order

| | N | | N | |------------------+---| |------------------+---|

|BEDROOMS          |   |   |BEDROOMS          |   |
|2        RANCH    |  1|   |2        CONDO    |  2|
|         CONDO    |  2|   |         TWOSTORY |  2|
|         TWOSTORY |  2|   |         RANCH    |  1|
|         ALL      |  5|   |         ALL      |  5|
|4        CONDO    |  1|   |4        TWOSTORY |  2|
|         TWOSTORY |  2|   |         SPLIT    |  1|
|         SPLIT    |  1|   |         CONDO    |  1|
|         ALL      |  4|   |         ALL      |  4|
|3        RANCH    |  2|   |3        RANCH    |  2|
|         CONDO    |  1|   |         SPLIT    |  1|
|         SPLIT    |  1|   |         CONDO    |  1|
|         ALL      |  4|   |         ALL      |  4|
|1        RANCH    |  1|   |1        RANCH    |  1|
|         SPLIT    |  1|   |         SPLIT    |  1|
|         ALL      |  2|   |         ALL      |  2|

Output 3 Figure 1

Notice in Output 3 that BEDROOMS is listed in order of descending frequency counts of ALL, but STYLE is the same order for each subgroup of BEDROOMS. Figure 1 shows the values of STYLE in the desired order. The order for STYLE in Output 3 is based on the master ordering scheme developed independently and based on the overall frequency counts for all subgroups combined, not on the frequency counts of STYLE in a given subgroup. The following example demonstrates: /* OUTPUT 4 */
proc tabulate

data=sasuser.houses noseps format=3. order=freq; class style;
table style=' ', n / rts=10;
run;


         | N |   
|--------+---|

|RANCH   |  4|                                            
|CONDO   |  4|                                            
|TWOSTORY|  4|                                          
|SPLIT   |  3| ----
---------- Output
Output 4       Master Ordering Sequence for STYLE

Thus, the master ordering scheme for BEDROOMS when using ORDER=FREQ with PROC TABULATE is 2, 4, 3, 1, and for STYLE it is RANCH, CONDO, TWOSTORY, and SPLIT. For values of STYLE to be listed by descending frequencies within each subgroup of BEDROOMS, you must use double mapping. For more information on master ordering schemes, review the section "Multiple Class Variables" in part 1 of this series.

What Is Double Mapping and How Does It Work? Double mapping is a process that creates additional variables that make it easy to obtain complex ordering schemes. Once these variables exist, various reports with unique ordering schemes can be obtained. Three types of reports that can be obtained using double mapping are demonstrated in the section "Producing Reports with a Double Mapped Data Set." Before learning the double mapping process, however, you should become familiar with what a double mapped data set is and how it obtains certain ordering schemes.

To arrange the data values as in Figure 1, the internal values need to correspond to positions in the desired ordering scheme. Figure 2 illustrates the sequence number assignments needed as internal values. Note that the variable STYLE has two sequence number assignments. This is necessary to ensure that internal values are not grouped together by their formatted values. The formatted values are the original values of BEDROOMS and STYLE.

Figure 2 Sequence Number Assignments for Double Mapping

ORDER1 is the sequence assignment for BEDROOMS. ORDER2 is the sequence assignment for STYLE in relation to the entire report. ORDER3 is the sequence assignment for STYLE within each subgroup of BEDROOMS. ORDER3 serves two purposes in double mapping. First, it prevents internal values of ORDER2 from being combined together by their formatted values. Notice that ORDER2 values of 1, 6, and 9 all correspond to STYLE=CONDO. Without ORDER3, the values of 1, 6, and 9 would be grouped together in the final report, causing CONDO to appear first in each subgroup of BEDROOMS. The second purpose of ORDER3 is to preserve the order of STYLE in each subgroup of BEDROOMS. ORDER3 provides the ability to generate unique reports with BEDROOMS and STYLE in various row/column configurations. This is discussed in more detail in the section "Producing Reports with a Double Mapped Data Set." Now assume that ORDER1, ORDER2, and ORDER3 are variables in a data set and have corresponding formats where the values of BEDROOMS and STYLE are the formatted values. This is equivalent to the following solution, which is hard coded for illustration purposes only:

proc format;

/* formats corresponding to order 1 */ value ordr1fmt

1='2' 2='4' 3='3' 4='1';

/* formats corresponding to order 2 */ value ordr2fmt

1,6,9 ='CONDO' 2,4 ='TWOSTORY' 3,7,10='RANCH' 5,8,11='SPLIT';

data dbl_map;

input order1 order2 order3 count; cards;

1     1    1      2
1     2    2      2
1     3    3      1
2     4    1      2
2     5    2      1
2     6    3      1
3     7    1      2
3     8    2      1
3     9    3      1
4    10    1      1
4    11    2      1

; run;

At this point, WORK.DBL_MAP is a summarized double mapped data set and can be used to obtain the desired ordering sequence, as in Figure 1. The following example demonstrates the use of WORK.DBL_MAP to obtain a report with the data values arranged in the order illustrated in Figures 1 and 2:

/* OUTPUT 5 */
proc tabulate data=dbl_map noseps format=3.;

options label;
class order1 order2 order3;
freq count;
table order1*(order3*order2=' ' all), n / rts=29; label order1='BEDROOMS';
format order1 ordr1fmt. order2 ordr2fmt.; run;

| | N | |---------------------------+---|

|BEDROOMS ORDER3            |   |
|2        1        CONDO    |  2|
|         2        TWOSTORY |  2|
|         3        RANCH    |  1|
|         ALL               |  5|
|4        ORDER3            |   |
|         1        TWOSTORY |  2|
|         2        SPLIT    |  1|
|         3        CONDO    |  1|
|         ALL               |  4|
|3        ORDER3            |   |
|         1        RANCH    |  2|
|         2        SPLIT    |  1|
|         3        CONDO    |  1|
|         ALL               |  4|
|1        ORDER3            |   |
|         1        RANCH    |  1|
|         2        SPLIT    |  1|
|         ALL               |  2|

Output 5 Desired Ordering Sequence Using PROC TABULATE

Notice that the desired ordering scheme is obtained, but a new column, ORDER3, appears in the report. ORDER3 is necessary to prevent ORDER2 from being ordered by lowest internal value for each format group. Without ORDER3, ORDER2 would display according to the master ordering sequence of CONDO, TWOSTORY, RANCH, SPLIT. Suppress the display of ORDER3 by using PROC REPORT rather than PROC TABULATE to create the list-style report as follows: /* OUTPUT 6 */
proc report data=dbl_map nowindows headline;

column order1 order3 order2 n;
freq count;
define order1 / group order=internal format=ordr1fmt8.

"BEDROOMS";
define order2 / group order=internal format=ordr2fmt.

"STYLE";
define order3 / group

noprint;
break after order1 / ol summarize skip; run;
BEDROOMS STYLE N

       2     CONDO          2
          TWOSTORY          2
             RANCH          1
--------            ---------
       2                    5

       4  TWOSTORY          2
             SPLIT          1
             CONDO          1
--------            ---------
       4                    4

       3     RANCH          2
             SPLIT          1
             CONDO          1
--------            ---------
       3                    4

       1     RANCH          1
             SPLIT          1
--------            ---------
       1                    2

Output 6        Desired Ordering Sequence Obtained with PROC REPORT

Double Mapping a Data Set
The following seven steps provide a detailed explanation of how to double map any SAS data set without hard coding.
Step 1: Summarize and Create the Ordering Statistic Start the double mapping process by summarizing the data and calculating the statistic by which to order using the SUMMARY procedure. This step has two variations depending on whether you plan to order your report by a statistic based on an analysis variable or by frequency counts based on combinations of class values. If you want to order your report by a statistic based on an analysis variable, then use the following version of step 1 that uses PROC SUMMARY to create an output data set with a variable containing the mean of PRICE called ORDRSTAT: /* step 1a: summarize and compute ORDRSTAT */ proc summary data=sasuser.houses;

class bedrooms style;
var price;
output out=summary mean=ordrstat;
run;

WORK.SUMMARY after Step 1a

BEDROOMS STYLE _TYPE_ _FREQ_ ORDRSTAT

    .                      0        15       82720.00
    .       CONDO          1         4       99312.50
    .       RANCH          1         4       68575.00
    .       SPLIT          1         3       77983.33
    .       TWOSTORY       1         4       83825.00
    1                      2         2       50200.00
    2                      2         5       75970.00
    3                      2         4       82187.50
    4                      2         4      107950.00
    1       RANCH          3         1       34550.00
    1       SPLIT          3         1       65850.00
    2       CONDO          3         2       95375.00
    2       RANCH          3         1       64000.00
    2       TWOSTORY       3         2       62550.00
    3       CONDO          3         1       79350.00
    3       RANCH          3         2       87875.00
    3       SPLIT          3         1       73650.00
    4       CONDO          3         1      127150.00
    4       SPLIT          3         1       94450.00
    4       TWOSTORY       3         2      105100.00

Output Data Set The CLASS statement generates summarizations on all combinations of class values and also on individual variables. The MEAN= option in the OUTPUT statement calculates the mean value of PRICE for each of the combinations. Keep the _FREQ_ variable because it maintains the original frequency counts in the final reports. You can drop _FREQ_ if you merge the final
double mapped data set back into the primary data set at the end of the double mapping process. To order your report based on the frequency counts of combinations of class values, use the second variation of step 1, which renames _FREQ_ to ORDRSTAT as follows:

   /* step 1b: summarize and create ORDRSTAT by renaming        */
   /*          _FREQ_   */

proc summary data=sasuser.houses;

class bedrooms style;
output out=summary(rename=(_freq_=ordrstat)); run;

Step 2: Sort WORK.SUMMARY by ORDRSTAT
The second step is to sort WORK.SUMMARY by ORDRSTAT. To order the data by descending values of ORDRSTAT, add the DESCENDING option to the BY statement. The following example demonstrates:

/* step 2: sort by ORDRSTAT */
proc sort data=summary;

by descending ordrstat;
run;

Step 3: Separate SUMMARY into CONTROL1 and CONTROL2 This step separates WORK.SUMMARY into the data sets WORK.CONTROL1 and WORK.CONTROL2 that can be merged later to create a double mapped data set. WORK.CONTROL1 contains the highest order class variable, BEDROOMS, and its corresponding sequence variable ORDER1. WORK.CONTROL2 contains both BEDROOMS and STYLE and a sequence variable, ORDER2, based on the combinations of BEDROOMS and STYLE. Both WORK.CONTROL1 and WORK.CONTROL2 must also contain the variables FMTNAME and TYPE for the generation of formats. The following code demonstrates how to accomplish these tasks in one DATA step:

/* step 3: create CONTROL1 and CONTROL2 */ data control1(drop=order2 _type_ style)

control2(drop=order1 _type_);
set summary;
retain type 'N';
if _type_=2 then do;

      order1+1;
      fmtname='ordr1fmt';
      output control1;

end;
else if _type_=3 then do;

      order2+1;
      fmtname='ordr2fmt';
      output control2;

end;
run;

The mean for each value of BEDROOMS is output to WORK.CONTROL1 by selecting observations with _TYPE_=2. Likewise, mean values for each combination of BEDROOMS and STYLE are output to WORK.CONTROL2 by selecting observations with _TYPE_=3. The variables ORDER1 and ORDER2 are incremented for each observation output to their corresponding data sets. ORDER1 and ORDER2 are used later in place of BEDROOMS and STYLE because their internal values correspond to the desired ordering sequence. Notice in the data sets below that ORDRSTAT is in descending sequence.

Step 4: Create Formats
The next step in the double mapping process is creating the formats. Use the WORK.CONTROL1 and WORK.CONTROL2 data sets as input control data sets to PROC FORMAT by renaming the ORDER1 and ORDER2 variables to START and renaming the variables BEDROOMS and STYLE to LABEL. In addition to the ORDR1FMT. and ORDR2FMT. formats, create the formats BLANK. and RANK. The BLANK. format blanks out nonapplicable values, and the RANK. format displays a meaningful value for the variable ORDER3, which is created in a later step. The following example creates these formats:

/* step 4: create the formats */
proc format

cntlin=control1(rename=(order1=start bedrooms=label)); proc format

cntlin=control2(rename=(order2=start style =label)); value blank

      .,0 =' '
      other=(|dollar8.|);
   picture rank
      1   ='1st'
      2   ='2nd'
      3   ='3rd'
      4-20='09th';

run;

Notice in the BLANK format the assignment of the OTHER category: OTHER=(|DOLLAR8.|). This allows you to associate another format with a particular format range as documented in SAS Technical Report P-222, Changes and Enhancements to Base SAS Software, Release 6.07. For more information on formats and control data sets, refer to the SAS Procedures Guide; "Creating Formats with Control Data Sets (Observations, First Quarter 1992); or part 2 of this series.

WORK.CONTROL1 after Step 3

ORDER1 BEDROOMS ORDRSTAT FMTNAME TYPE _FREQ_

   1          4         107950    ordr1fmt     N         4
   2          3          82187    ordr1fmt     N         4
   3          2          75970    ordr1fmt     N         5
   4          1          50200    ordr1fmt     N         2

WORK.CONTROL2 after Step 3

BEDROOMS ORDER2 STYLE ORDRSTAT FMTNAME TYPE _FREQ_

    4          1      CONDO        127150     ordr2fmt     N         1
    4          2      TWOSTORY     105100     ordr2fmt     N         2
    2          3      CONDO         95375     ordr2fmt     N         2
    4          4      SPLIT         94450     ordr2fmt     N         1
    3          5      RANCH         87875     ordr2fmt     N         2
    3          6      CONDO         79350     ordr2fmt     N         1
    3          7      SPLIT         73650     ordr2fmt     N         1
    1          8      SPLIT         65850     ordr2fmt     N         1
    2          9      RANCH         64000     ordr2fmt     N         1
    2         10      TWOSTORY      62550     ordr2fmt     N         2
    1         11      RANCH         34550     ordr2fmt     N         1

Output Data Set

Step 5: Merge the Control Data Sets
This step creates WORK.SUMMARY2 by match merging WORK.CONTROL1 and WORK.CONTROL2 by the highest order class variable BEDROOMS:

/* step 5: create the WORK.SUMMARY2 */ proc sort data=control1;

by bedrooms;

proc sort data=control2;

by bedrooms;

data summary2;

merge control1 control2;
by bedrooms;
drop fmtname type;
run;

WORK.SUMMARY2 after Step 5

ORDER1 BEDROOMS ORDER2 STYLE ORDRSTAT _FREQ_

   4          1          8      SPLIT         65850        1
   4          1         11      RANCH         34550        1
   3          2          3      CONDO         95375        2
   3          2          9      RANCH         64000        1
   3          2         10      TWOSTORY      62550        2
   2          3          5      RANCH         87875        2
   2          3          6      CONDO         79350        1
   2          3          7      SPLIT         73650        1
   1          4          1      CONDO        127150        1
   1          4          2      TWOSTORY     105100        2
   1          4          4      SPLIT         94450        1

Output Data Set

Step 6: Create a Summarized Double Mapped Data Set
This step creates a summarized double mapped data set such as ones used earlier to generate Outputs 5 and 6. The data set can either be used directly with a reporting procedure, or it can be merged with the original data set (SASUSER.HOUSES) to create an unsummarized double mapped data set later in step 7. This type of double mapped data set is useful if you need to use variables in the primary data set that are dropped in the double mapping process.

Create a new variable, ORDER3, to prevent ORDER2 from being ordered according to the master ordering scheme for the procedure, which is based on the lowest internal values in a format group. Sort WORK.SUMMARY2 by ORDER1 and ORDER2 and use BY group processing to increment the value of ORDER3 within each subgroup of ORDER1. This step has two variations depending on the size of the primary data set and whether you want a double mapped data set that has not been summarized. One version creates an index based on the combinations of BEDROOMS and STYLE. The index can be used to perform direct access processing to avoid sorting the primary data set. If the primary data set is large, this can save computing resources.

If you want a summarized double mapped data set or if the primary data set is small, then use the following version of
step 6:

/* step 6a: create WORK.DBL_MAP1 to be used either */

   /*          directly with a procedure or merged with a       */
   /*          small primary data set in step 7a        */

proc sort data=summary2;

by order1 order2;

data dbl_map1;

set summary2;
by order1 order2;
if first.order1 then order3=0;
order3+1;
run;

Notice in the resulting data set that ORDRSTAT is in descending order and that ORDER1 and ORDER2 are in ascending numeric order. Also notice that ORDER3 is in ascending numeric order within each subgroup of BEDROOMS. Thus, the proper assignment of sequence numbers similar to those in Figure 2 has been accomplished.

WORK.DBL_MAP1 after Step 6a

ORDER1 BEDROOMS ORDER2 ORDER3 STYLE ORDRSTAT _FREQ_

   1          4          1         1      CONDO        127150        1
   1          4          2         2      TWOSTORY     105100        2
   1          4          4         3      SPLIT         94450        1
   2          3          5         1      RANCH         87875        2
   2          3          6         2      CONDO         79350        1
   2          3          7         3      SPLIT         73650        1
   3          2          3         1      CONDO         95375        2
   3          2          9         2      RANCH         64000        1
   3          2         10         3      TWOSTORY      62550        2
   4          1          8         1      SPLIT         65850        1
   4          1         11         2      RANCH         34550        1

Output Data Set

If you do plan to merge the double mapped data set with a large primary data set, then use the following version of step 6:

 /* step 6b: create WORK.DBL_MAP1 to be merged with a         */ /*
large primary data in step 6b */

proc sort data=summary2;

by order1 order2;

data dbl_map1(index=(o3_index=(bedrooms style)));

set summary2;
by order1 order2;
if first.order1 then order3=0;
order3+1;
run;

Note that creating and maintaining an index file can consume significant resources. Step 6b should only be used if the primary data set is large and thus more costly to sort than it would be to create an index file on the summarized data set. Step 7: Create a Double Mapped Data Set That Is Not Summarized (Optional) This step is useful only when you need to use variables in the primary data set that were dropped in the double mapping process. For example, you may wish to list the values of SQFEET and BATHS in your report. Since these variables were not used in the double mapping process, their values
have to be retrieved by merging the summarized double map-ped data set WORK.DBL_MAP1 with the primary data set SASUSER.HOUSES. If the primary data set is relatively small, then use the following version of step 7, which uses WORK.DBL_MAP1 created by step 6a:

/* step 7a: create a double mapped data set that has */

 /*
not been summarized using WORK.DBL_MAP1     */ /*          from step 6a as input
*/

proc sort data=dbl_map1(drop=_freq_);

by bedrooms style;

proc sort data=sasuser.houses out=houses;

by bedrooms style;

data dbl_map2;

merge dbl_map1 houses;
by bedrooms style;
run;

If the primary data set is large, then use the following version of step 7, which inputs the WORK.DBL_MAP1 data set created by step 6b: /* step 7b: create a double mapped data set that has */

/*          not been summarized using WORK.DBL_MAP1     */
/*          from step 6b as input       */

data dbl_map2;
set sasuser.houses;
set dbl_map1 key=o3_index;
run;

Notice in step 7a that both WORK.DBL_MAP1 and SASUSER.HOUSES must be sorted before merging. Sorting is not necessary in step 7b because the KEY= option performs direct access processing and does not require sorting. For more information on the KEY= option, refer to SAS Technical Report P-222.

WORK.DBL_MAP2 after Step 7a

ORDER1    BEDROOMS    ORDER2    ORDER3    STYLE       ORDRSTAT
PRICE  SQFEET   BATHS

4       1       11      2      RANCH         34550      $34,550      720    1.0
4       1        8      1      SPLIT         65850      $65,850     1190    1.0
3       2        3      1      CONDO         95375      $80,050     1400    1.5
3       2        3      1      CONDO         95375     $110,700     1860    2.0
3       2        9      2      RANCH         64000      $64,000     1250    1.0
3       2       10      3      TWOSTORY      62550      $55,850     1040    1.0
3       2       10      3      TWOSTORY      62550      $69,250     1240    1.0
2       3        6      2      CONDO         79350      $79,350     1390    2.5
2       3        5      1      RANCH         87875      $86,650     1500    3.0
2       3        5      1      RANCH         87875      $89,100     1535    3.0
2       3        7      3      SPLIT         73650      $73,650     1305    1.5
1       4        1      1      CONDO        127150     $127,150     2105    2.5
1       4        4      3      SPLIT         94450      $94,450     1615    3.0
1       4        2      2      TWOSTORY     105100     $107,250     1810    3.0
1       4        2      2      TWOSTORY     105100     $102,950     1745    2.5

Output Data Set

Producing Reports with a Double Mapped Data Set Once a data set is double mapped, there are many reporting possibilities. The following sections illustrate three types of reports, list-style, multicolumn, and top-rank, that can be created using a double mapped data set as input.

List-style Report
A list-style report is a report where each combination of class variable values define a row. A double mapped data set can add meaning to these types of reports by allowing different sequences of class values in subgroups of higher order class variables. Figure 3 illustrates the structure of a list-style report:

Figure 3 Structure of a List-style Report

The following example demonstrates the creation of a list-style report using WORK.DBL_MAP2 as input: /* OUTPUT 7 */
proc report data=dbl_map2 nowindows headskip;

column order1 order3 order2 price sqfeet baths; define order1 / "BEDROOMS" group format=ordr1fmt8.

order=internal;
define order3 / noprint group

order=internal;
define order2 / "STYLE" group format=ordr2fmt8.

order=internal left;
define price / "PRICE" mean format=blank.; define sqfeet / "SQFEET" mean format=6.; define baths / "BATHS" mean format=6.1; break after order1 / ol summarize skip; run;

Notice in Output 7 that BEDROOMS and STYLE are both listed by descending values of PRICE. SQFEET and BATHS are listed to demonstrate that variables not used in the double mapping process can still be displayed in the final report.

BEDROOMS STYLE PRICE SQFEET BATHS

         4  CONDO     $127,150    2105     2.5
            TWOSTORY  $105,100    1778     2.8
            SPLIT      $94,450    1615     3.0
  --------            --------  ------  ------
         4            $107,950    1819     2.8

         3  RANCH      $87,875    1518     3.0
            CONDO      $79,350    1390     2.5
            SPLIT      $73,650    1305     1.5
  --------            --------  ------  ------
         3             $82,188    1433     2.5

         2  CONDO      $95,375    1630     1.8
            RANCH      $64,000    1250     1.0
            TWOSTORY   $62,550    1140     1.0
  --------            --------  ------  ------
         2             $75,970    1358     1.3

         1  SPLIT      $65,850    1190     1.0
            RANCH      $34,550     720     1.0
  --------            --------  ------  ------
         1             $50,200     955     1.0

Output 7        List-style Report

Multicolumn Report
A multicolumn report has a separate set of columns for each value of the highest order class variable. The report is unique because within each set of columns, the rows are ordered by the descending value of ORDRSTAT. Figure 4 illustrates the structure of a multicolumn report:

Figure 4 Structure of a Multicolumn Report

The following demonstrates how to generate a multicolumn report with PROC REPORT using WORK.DBL_MAP2 as input:

/* OUTPUT 8 */ proc report data=dbl_map2 nowindows spacing=1;

column order3 order1,("--" order2 ordrstat); define order3 / group order=internal

noprint; define order1 / across order=internal format=ordr1fmt.

      "--/BEDROOMS/--";
   define order2   / mean   left           format=ordr2fmt.
      "STYLE/--";
   define ordrstat / mean                  format=blank.
      "PRICE/--";

rbreak after / dol summarize; compute after;

      _c2_ = .;
      _c4_ = .;
      _c6_ = .;
      _c8_ = .;

endcomp; run;

BEDROOMS

4 3 2 1

STYLE PRICE STYLE PRICE STYLE PRICE STYLE PRICE

CONDO    $127,150 RANCH     $87,875 CONDO     $95,375 SPLIT     $65,850
TWOSTORY $105,100 CONDO     $79,350 RANCH     $64,000 RANCH     $34,550
SPLIT     $94,450 SPLIT     $73,650 TWOSTORY  $62,550
======== ======== ======== ======== ======== ======== ========  ========
         $107,950           $82,188           $75,970           $50,200

Notice that ORDER2 is used as an ANALYSIS variable defined with a MEAN statistic. This is necessary because only
ANALYSIS and COMPUTED variables can be added below ACROSS variables. ORDER2 is defined as an ANALYSIS variable because it is numeric. Use the MEAN statistic to ensure that the original values of ORDER2 are unchanged so that the formats will map correctly. Since ORDER2 has the same value in each observation grouped for each crossing, the MEAN statistic returns the correct sequence value. It is this feature of the double mapping process that makes it an extremely powerful report generating technique.

In the COMPUTE AFTER block, certain columns are set to missing. These columns represent the different STYLE columns in the report. Since ORDER2 is defined as an ANALYSIS variable, PROC REPORT summarizes its values on the RBREAK line. Since these values have no meaning on the RBREAK line, they are reassigned to missing. The missing
values are then displayed as blanks by system option MISSING=' '.

Top-Rank Reports
A top-rank report lists rows in order of a statistic computed on ORDRSTAT or on each value of the highest order class variables. The remaining columns from left to right list the second order class variable in order of statistical values computed on ORDRSTAT for the combinations of the class variables. You can generate this report with either PROC REPORT or
PROC TABULATE. This type of report is useful when you want to review results in a top-down, left-right manner. It also enables you to highlight the most relevant information in your report by subsetting on the highest rankings in the report. Figure 5 illustrates the structure of a top-rank report:

Figure 5 Structure of a Top-Rank Report

The following example demonstrates a top-rank report obtained with PROC REPORT using the summarized WORK.DBL_MAP1 output in step 6a:

/* OUTPUT 9 */ proc report data=dbl_map1 nowindows spacing=2;

where (order3 <= 3); freq _freq_; column ("--/BEDROOMS/--" order1 ordrstat=tot)

order3,("--" order2 ordrstat); define order1 / group format=ordr1fmt. order=internal

      "#/-";
   define tot      / mean   format=blank9.
      "AVG PRICE/--";
   define order3   / across format=rank.     order=internal
      "TOP 3 STYLES/--";
   define order2   / mean   format=ordr2fmt. left
      "STYLE/--";
   define ordrstat / mean   format=blank.    spacing=1
      "PRICE/--";

run;

BEDROOMS 1st 2nd 3rd

# AVG PRICE STYLE PRICE STYLE PRICE STYLE PRICE

  4   $107,950  CONDO    $127,150  TWOSTORY $105,100  SPLIT     $94,450
  3    $82,188  RANCH     $87,875  CONDO     $79,350  SPLIT     $73,650
  2    $75,970  CONDO     $95,375  RANCH     $64,000  TWOSTORY  $62,550
  1    $50,200  SPLIT     $65,850  RANCH     $34,550

Notice that this report uses summarized WORK.DBL_MAP1. This highlights the fact it is not always necessary to merge the double mapped data set with the primary data. Be sure to use _FREQ_ in a FREQ statement when using a summarized double mapped data set; if this is omitted, the overall mean values for BEDROOMS would reflect the mean of the values in the summarized WORK.DBL_MAP1 and not the actual means that would have been obtained had SASUSER.HOUSES been used. Also, associate the RANK. format with ORDER3. The format is designed to give meaningful descriptions to the sequence values displayed by ORDER3. Output 9 shows the top-rank report generated.

Because ORDER3 is actually displayed in the output, PROC TABULATE can also be used to generate a top-rank report. The following example demonstrates how to obtain this report with the TABULATE procedure:

/* OUTPUT 10 */ proc tabulate data=dbl_map1 format=blank. noseps;

options label; where (order3<=3); freq _freq_; class order1 order3; var order2 ordrstat; table order1=' ', ordrstat*mean

      order3*(order2*mean*f=ordr2fmt. ordrstat*mean)
      / rts=7 box='COUNT OF   +---+ BED ROOMS';

format order1 ordr1fmt. order3 rank.; label

      order2  ='STYLE'
      order3  ='TOP 3 STYLES'
      ordrstat='PRICE';
   keylabel
      mean    =' ';

run;


|COUNT|        |                    TOP 3 STYLES                     |
|OF   |        |-----------------------------------------------------|
|+---+|        |       1st       |       2nd       |       3rd       |
|BED  |        |-----------------+-----------------+-----------------|

|ROOMS| PRICE  | STYLE  | PRICE  | STYLE  | PRICE  | STYLE  | PRICE  |

|-----+--------+--------+--------+--------+--------+--------+--------|
|4    |$107,950|CONDO   |$127,150|TWOSTORY|$105,100|SPLIT   | $94,450|

|3    | $82,188|RANCH   | $87,875|CONDO   | $79,350|SPLIT   | $73,650|

|2    | $75,970|CONDO   | $95,375|RANCH   | $64,000|TWOSTORY| $62,550|


|1    | $50,200|SPLIT   | $65,850|RANCH   | $34,550|        |        |

Output 10 Top-Rank Report Generated with PROC TABULATE

Simplifying the Double Mapping Process
The double mapping process can be consolidated with the use of the SQL procedure. For example, the SQL procedure can consolidate steps 5 and 6 as follows:

/* alternative approach to steps 5 and 6 */ proc sql;

create view summary2 as select c1.order1, c1.bedrooms, c2.order2, c2.style,

c2.ordrstat from control1 c1, control2 c2 where c1.bedrooms=c2.bedrooms order by c1.order1, c2.order2; quit;

data dbl_map1;

set summary2; by order1 order2; if first.order1 then order3=0; order3+1; run;

The above example uses one procedure step and one DATA step as opposed to three procedure steps and two DATA steps as previously demonstrated. There are other points in the process that may also be consolidated. Experienced programmers are encouraged to consolidate steps where they see fit.

The process can also be generalized for any data set using the macro facility. Because the only changes that would be needed are the data set and variable names, the macro facility can be used to make the substitutions so that users unfamiliar with the process can still double map a data set. You could even provide users with a choice of reports to generate.

A detailed explanation of the SQL procedure and the macro facility are beyond the scope of this article. However, their use is highly recommended. For more information on PROC SQL, see the SAS Guide to the SQL Procedure, Usage and Reference, Version 6, First Edition. For more information on the macro facility, see the SAS Guide to Macro Processing, Version 6, Second Edition.

Conclusion
Controlling the order of data values can bring additional meaning to your reports. Understanding the default ordering schemes and available options can help you to control the order of data values in your reports. However, there will be situations where the procedure does not provide an ordering option that meets your needs. In these cases, you can use ordering techniques to achieve the sequence that you want. Dummy data, mapping, and double mapping techniques provide solutions to many of these situations. In other cases, desired ordering schemes can be obtained by combining these techniques. With a little planning and some creativity, virtually any ordering scheme can be obtained.

SAS, SAS/ASSIST, and Observations are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. (R) indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.

Appendix

The following OPTIONS and TITLE statements are used with all examples in this article that do not explicitly contain other settings:

options nodate nonumber nolabel ls=80 missing=' '; title;

The following is the SASUSER.HOUSES data set, which is used to supply the data for most examples in this article:

SASUSER.HOUSES

STYLE SQFEET BEDROOMS BATHS STREET PRICE

RANCH        1250         2        1.0     Sheppard Avenue      $64,000
SPLIT        1190         1        1.0     Rand Street          $65,850
CONDO        1400         2        1.5     Market Street        $80,050
TWOSTORY     1810         4        3.0     Garris Street       $107,250
RANCH        1500         3        3.0     Kemble Avenue        $86,650
SPLIT        1615         4        3.0     West Drive           $94,450
SPLIT        1305         3        1.5     Graham Avenue        $73,650
CONDO        1390         3        2.5     Hampshire Avenue     $79,350
TWOSTORY     1040         2        1.0     Sanders Road         $55,850
CONDO        2105         4        2.5     Jeans Avenue        $127,150
RANCH        1535         3        3.0     State Highway        $89,100
TWOSTORY     1240         2        1.0     Fairbanks Circle     $69,250
RANCH         720         1        1.0     Nicholson Drive      $34,550
TWOSTORY     1745         4        2.5     Highland Road       $102,950
CONDO        1860         2        2.0     Arcata Avenue       $110,700

Output Data Set

Resources