|
Ordering Data Values: Double Mapping and Combining Techniques
by Jason R. Sharpe Abstract Introduction Combining Techniques Part 1 of this series discusses system specific collating sequences where data values may be ordered differently depending on the operating system or platform you use. Under the MVS operating system, an EBCDIC platform, a sorts before A, whereas in the UNIX environment, an ASCII platform, A is ordered before a. To circumvent this difference, you can use PROC SORT to create a dummy data set ordered by the collating sequence of your choice. Then, simply use PROC REPORT and select ORDER=DATA for the variable that needs the dummy data set. The following example demonstrates the different ordering schemes due to collating sequences: data sample;
input code $1. date date7.; A 01JAN93 /* Fri */ a 06JAN93 /* Wed */ a 11JAN93 /* Mon */ a 18JAN93 /* Mon */ A 22JAN93 /* Fri */ A 26JAN93 /* Tue */ a 26JAN93 /* Tue */ A 04FEB93 /* Thu */ A 08FEB93 /* Mon */ A 08FEB93 /* Mon */ a 12FEB93 /* Fri */ a 19FEB93 /* Fri */ A 22FEB93 /* Mon */ ;
/* OUTPUT 1a and 1b */
column code date n;
"DAY"; Output 1a was generated on MVS, an EBCDIC platform, and Output 1b was generated on UNIX, an ASCII platform: CODE DAY N CODE DAY N
a Fri 2 A Fri 2
Wed 1 Mon 3
Mon 2 Tue 1
Tue 1 Thu 1
A Fri 2 a Fri 2
Mon 3 Wed 1
Tue 1 Mon 2
Thu 1 Tue 1
Output 1a Output 1b
Output 1a-1b Default Ordering Sequence on MVS and UNIX Notice that the ordering schemes are different. First, in Output 1a, a appears before A whereas in Output 1b, A comes before a. This variation is due to differences between collating sequences. For more information on collating sequences refer to Chapter 31, "The SORT Procedure," in the SAS Procedures Guide, Version 6, Third Edition. Next, notice that the days of the week do not appear in standard calendar sequence in either output. This occurs because DATE is ordered by the lowest internal value encountered in each format group. For more information on this, refer to the section "Formats" in part 1 of this series. The goal of this example is to order CODE with an EBCDIC collating
sequence, regardless of platform, and to arrange DATE in a standard calendar
sequence. This is accomplished using both a dummy data set and mapping
techniques. The Mapping Process data temp;
set sample; The following is a listing of the data set WORK.TEMP: WORK.TEMP CODE DATE FMTNAME TYPE DAY LABEL A -18627 dayfmt N 6 Fri a -18622 dayfmt N 4 Wed a -18617 dayfmt N 2 Mon a -18610 dayfmt N 2 Mon A -18606 dayfmt N 6 Fri A -18602 dayfmt N 3 Tue a -18602 dayfmt N 3 Tue A -18593 dayfmt N 5 Thu A -18589 dayfmt N 2 Mon A -18589 dayfmt N 2 Mon a -18585 dayfmt N 6 Fri a -18578 dayfmt N 6 Fri A -18575 dayfmt N 2 Mon Output Data Set Next, sort WORK.TEMP using PROC SORT with the NODUPKEY option to ensure that there are no duplicate values of DAY. proc sort data=temp(keep=fmtname day label type) out=control nodupkey; WORK.CONTROL FMTNAME TYPE DAY LABEL dayfmt N 2 Mon dayfmt N 3 Tue dayfmt N 4 Wed dayfmt N 5 Thu dayfmt N 6 Fri Output Data Set Once the duplicate values are removed, DAY is used as the START variable for a control data set, and a format is created using WORK.CONTROL as input to PROC FORMAT: proc format cntlin=control(rename=(day=start)); run; The Dummy Data Process proc sort
data=temp(keep=code day) out=dummy ebcdic nodupkey;
by code day; CODE DAY a 2 a 3 a 4 a 6 A 2 A 3 A 5 A 6 Output Data Set Notice that a is before A, which agrees with Output 1a obtained with an EBCDIC collating sequence. Finally, concatenate WORK.DUMMY and WORK.SAMPLE to generate the final data set. Create the VALID variable to differentiate the dummy observations from the actual observations and maintain the correct frequency counts in the final report. Use DAY to obtain the desired calendar sequence for DATE. DAY is also part of the mapping process. The following code demonstrates: data sample2;
set dummy sample(in=is_valid);
valid=1;
day=weekday(date);
end; The following are the WORK.DUMMY, WORK.SAMPLE, and WORK.SAMPLE2 data sets that illustrate the results of the concatenation. Note that in the WORK.SAMPLE data set DATE is displayed with the WEEKDATE3. format. WORK.DUMMY WORK.SAMPLE WORK.SAMPLE2 CODE DAY CODE DATE CODE DAY VALID
a 2 A Fri a 2 .
a 3 a Wed a 3 .
a 4 a Mon a 4 .
a 6 a Mon a 6 .
A 2 A Fri A 2 .
A 3 A Tue A 3 .
A 5 a Tue A 5 .
A 6 A Thu A 6 .
A Mon A 6 1
A Fri a 4 1
a Fri a 2 1
A Mon a 2 1
A 6 1
A 3 1
a 3 1
A 5 1
A 2 1
A 2 1
a 6 1
a 6 1
A 2 1
Output Data Sets The observations in WORK.SAMPLE2 where VALID is missing come from WORK.DUMMY. The variable DAY is nonmissing in these observations, ensuring that the dummy observations are not ignored because DAY is used as a GROUP variable in PROC REPORT. Replace the variable DATE with DAY, which is mapped to the correct day of the week to be displayed with the DAYFMT. format. Generate the Final Report
/* OUTPUT 2 */
column code day valid;
define valid / n format=1.
"N";
run;
The following is the final report with the values in the desired order: CODE DAY N
a Mon 2
Tue 1
Wed 1
Fri 2
A Mon 3
Tue 1
Thu 1
Fri 2
Output 2 Desired Ordering Scheme Obtained by Combining Techniques
Notice that a is before A according to the EBCDIC collating sequence and that the values of DAY are listed in standard calendar sequence. Thus, the two separate ordering schemes are obtained by combining dummy data and mapping techniques. Double Mapping When Is Double Mapping Necessary?
format=3. order=freq; Initial Order Desired Order | | N | | N | |------------------+---| |------------------+---| |BEDROOMS | | |BEDROOMS | | |2 RANCH | 1| |2 CONDO | 2| | CONDO | 2| | TWOSTORY | 2| | TWOSTORY | 2| | RANCH | 1| | ALL | 5| | ALL | 5| |4 CONDO | 1| |4 TWOSTORY | 2| | TWOSTORY | 2| | SPLIT | 1| | SPLIT | 1| | CONDO | 1| | ALL | 4| | ALL | 4| |3 RANCH | 2| |3 RANCH | 2| | CONDO | 1| | SPLIT | 1| | SPLIT | 1| | CONDO | 1| | ALL | 4| | ALL | 4| |1 RANCH | 1| |1 RANCH | 1| | SPLIT | 1| | SPLIT | 1| | ALL | 2| | ALL | 2| Output 3 Figure 1 Notice in Output 3 that BEDROOMS is listed in order of descending frequency
counts of ALL, but STYLE is the same order for each subgroup of BEDROOMS.
Figure 1 shows the values of STYLE in the desired order. The order for STYLE in
Output 3 is based on the master ordering scheme developed independently and
based on the overall frequency counts for all subgroups combined, not on the
frequency counts of STYLE in a given subgroup. The following example
demonstrates:
/* OUTPUT 4 */
data=sasuser.houses noseps format=3. order=freq;
class style;
| N |
|--------+---|
Thus, the master ordering scheme for BEDROOMS when using ORDER=FREQ with PROC TABULATE is 2, 4, 3, 1, and for STYLE it is RANCH, CONDO, TWOSTORY, and SPLIT. For values of STYLE to be listed by descending frequencies within each subgroup of BEDROOMS, you must use double mapping. For more information on master ordering schemes, review the section "Multiple Class Variables" in part 1 of this series. What Is Double Mapping and How Does It Work? Double mapping is a process that creates additional variables that make it easy to obtain complex ordering schemes. Once these variables exist, various reports with unique ordering schemes can be obtained. Three types of reports that can be obtained using double mapping are demonstrated in the section "Producing Reports with a Double Mapped Data Set." Before learning the double mapping process, however, you should become familiar with what a double mapped data set is and how it obtains certain ordering schemes. To arrange the data values as in Figure 1, the internal values need to correspond to positions in the desired ordering scheme. Figure 2 illustrates the sequence number assignments needed as internal values. Note that the variable STYLE has two sequence number assignments. This is necessary to ensure that internal values are not grouped together by their formatted values. The formatted values are the original values of BEDROOMS and STYLE. Figure 2 Sequence Number Assignments for Double Mapping ORDER1 is the sequence assignment for BEDROOMS. ORDER2 is the sequence assignment for STYLE in relation to the entire report. ORDER3 is the sequence assignment for STYLE within each subgroup of BEDROOMS. ORDER3 serves two purposes in double mapping. First, it prevents internal values of ORDER2 from being combined together by their formatted values. Notice that ORDER2 values of 1, 6, and 9 all correspond to STYLE=CONDO. Without ORDER3, the values of 1, 6, and 9 would be grouped together in the final report, causing CONDO to appear first in each subgroup of BEDROOMS. The second purpose of ORDER3 is to preserve the order of STYLE in each subgroup of BEDROOMS. ORDER3 provides the ability to generate unique reports with BEDROOMS and STYLE in various row/column configurations. This is discussed in more detail in the section "Producing Reports with a Double Mapped Data Set." Now assume that ORDER1, ORDER2, and ORDER3 are variables in a data set and have corresponding formats where the values of BEDROOMS and STYLE are the formatted values. This is equivalent to the following solution, which is hard coded for illustration purposes only: proc format; /* formats corresponding to order 1 */ value ordr1fmt 1='2' 2='4' 3='3' 4='1'; /* formats corresponding to order 2 */ value ordr2fmt 1,6,9 ='CONDO' 2,4 ='TWOSTORY' 3,7,10='RANCH' 5,8,11='SPLIT'; data dbl_map; input order1 order2 order3 count; cards; 1 1 1 2 1 2 2 2 1 3 3 1 2 4 1 2 2 5 2 1 2 6 3 1 3 7 1 2 3 8 2 1 3 9 3 1 4 10 1 1 4 11 2 1 ; run; At this point, WORK.DBL_MAP is a summarized double mapped data set and can be used to obtain the desired ordering sequence, as in Figure 1. The following example demonstrates the use of WORK.DBL_MAP to obtain a report with the data values arranged in the order illustrated in Figures 1 and 2: /* OUTPUT 5 */ options label; | | N | |---------------------------+---| |BEDROOMS ORDER3 | | |2 1 CONDO | 2| | 2 TWOSTORY | 2| | 3 RANCH | 1| | ALL | 5| |4 ORDER3 | | | 1 TWOSTORY | 2| | 2 SPLIT | 1| | 3 CONDO | 1| | ALL | 4| |3 ORDER3 | | | 1 RANCH | 2| | 2 SPLIT | 1| | 3 CONDO | 1| | ALL | 4| |1 ORDER3 | | | 1 RANCH | 1| | 2 SPLIT | 1| | ALL | 2| Output 5 Desired Ordering Sequence Using PROC TABULATE Notice that the desired ordering scheme is obtained, but a new column,
ORDER3, appears in the report. ORDER3 is necessary to prevent ORDER2 from being
ordered by lowest internal value for each format group. Without ORDER3, ORDER2
would display according to the master ordering sequence of CONDO, TWOSTORY,
RANCH, SPLIT. Suppress the display of ORDER3 by using PROC REPORT rather than
PROC TABULATE to create the list-style report as follows:
/* OUTPUT 6 */
column order1 order3 order2 n;
"BEDROOMS";
"STYLE";
noprint;
2 CONDO 2
TWOSTORY 2
RANCH 1
-------- ---------
2 5
4 TWOSTORY 2
SPLIT 1
CONDO 1
-------- ---------
4 4
3 RANCH 2
SPLIT 1
CONDO 1
-------- ---------
3 4
1 RANCH 1
SPLIT 1
-------- ---------
1 2
Output 6 Desired Ordering Sequence Obtained with PROC REPORT
Double Mapping a Data Set
class bedrooms style; WORK.SUMMARY after Step 1a BEDROOMS STYLE _TYPE_ _FREQ_ ORDRSTAT
. 0 15 82720.00
. CONDO 1 4 99312.50
. RANCH 1 4 68575.00
. SPLIT 1 3 77983.33
. TWOSTORY 1 4 83825.00
1 2 2 50200.00
2 2 5 75970.00
3 2 4 82187.50
4 2 4 107950.00
1 RANCH 3 1 34550.00
1 SPLIT 3 1 65850.00
2 CONDO 3 2 95375.00
2 RANCH 3 1 64000.00
2 TWOSTORY 3 2 62550.00
3 CONDO 3 1 79350.00
3 RANCH 3 2 87875.00
3 SPLIT 3 1 73650.00
4 CONDO 3 1 127150.00
4 SPLIT 3 1 94450.00
4 TWOSTORY 3 2 105100.00
Output Data Set The CLASS statement generates summarizations on all
combinations of class values and also on individual variables. The MEAN= option
in the OUTPUT statement calculates the mean value of PRICE for each of the
combinations. Keep the _FREQ_ variable because it maintains the original
frequency counts in the final reports. You can drop _FREQ_ if you merge the
final /* step 1b: summarize and create ORDRSTAT by renaming */ /* _FREQ_ */ proc summary data=sasuser.houses; class bedrooms
style; Step 2: Sort WORK.SUMMARY by ORDRSTAT /* step 2: sort by ORDRSTAT */ by descending
ordrstat; Step 3: Separate SUMMARY into CONTROL1 and CONTROL2 This step separates WORK.SUMMARY into the data sets WORK.CONTROL1 and WORK.CONTROL2 that can be merged later to create a double mapped data set. WORK.CONTROL1 contains the highest order class variable, BEDROOMS, and its corresponding sequence variable ORDER1. WORK.CONTROL2 contains both BEDROOMS and STYLE and a sequence variable, ORDER2, based on the combinations of BEDROOMS and STYLE. Both WORK.CONTROL1 and WORK.CONTROL2 must also contain the variables FMTNAME and TYPE for the generation of formats. The following code demonstrates how to accomplish these tasks in one DATA step: /* step 3: create CONTROL1 and CONTROL2 */ data control1(drop=order2 _type_ style)
control2(drop=order1 _type_);
order1+1;
fmtname='ordr1fmt';
output control1;
end;
order2+1;
fmtname='ordr2fmt';
output control2;
end; The mean for each value of BEDROOMS is output to WORK.CONTROL1 by selecting observations with _TYPE_=2. Likewise, mean values for each combination of BEDROOMS and STYLE are output to WORK.CONTROL2 by selecting observations with _TYPE_=3. The variables ORDER1 and ORDER2 are incremented for each observation output to their corresponding data sets. ORDER1 and ORDER2 are used later in place of BEDROOMS and STYLE because their internal values correspond to the desired ordering sequence. Notice in the data sets below that ORDRSTAT is in descending sequence. Step 4: Create Formats
/* step 4: create the formats */ cntlin=control1(rename=(order1=start bedrooms=label)); proc format cntlin=control2(rename=(order2=start style =label)); value blank
.,0 =' '
other=(|dollar8.|);
picture rank
1 ='1st'
2 ='2nd'
3 ='3rd'
4-20='09th';
run; Notice in the BLANK format the assignment of the OTHER category: OTHER=(|DOLLAR8.|). This allows you to associate another format with a particular format range as documented in SAS Technical Report P-222, Changes and Enhancements to Base SAS Software, Release 6.07. For more information on formats and control data sets, refer to the SAS Procedures Guide; "Creating Formats with Control Data Sets (Observations, First Quarter 1992); or part 2 of this series. WORK.CONTROL1 after Step 3 ORDER1 BEDROOMS ORDRSTAT FMTNAME TYPE _FREQ_ 1 4 107950 ordr1fmt N 4 2 3 82187 ordr1fmt N 4 3 2 75970 ordr1fmt N 5 4 1 50200 ordr1fmt N 2 WORK.CONTROL2 after Step 3 BEDROOMS ORDER2 STYLE ORDRSTAT FMTNAME TYPE _FREQ_
4 1 CONDO 127150 ordr2fmt N 1
4 2 TWOSTORY 105100 ordr2fmt N 2
2 3 CONDO 95375 ordr2fmt N 2
4 4 SPLIT 94450 ordr2fmt N 1
3 5 RANCH 87875 ordr2fmt N 2
3 6 CONDO 79350 ordr2fmt N 1
3 7 SPLIT 73650 ordr2fmt N 1
1 8 SPLIT 65850 ordr2fmt N 1
2 9 RANCH 64000 ordr2fmt N 1
2 10 TWOSTORY 62550 ordr2fmt N 2
1 11 RANCH 34550 ordr2fmt N 1
Output Data Set Step 5: Merge the Control Data Sets /* step 5: create the WORK.SUMMARY2 */ proc sort data=control1; by bedrooms; proc sort data=control2; by bedrooms; data summary2;
merge control1 control2; WORK.SUMMARY2 after Step 5 ORDER1 BEDROOMS ORDER2 STYLE ORDRSTAT _FREQ_ 4 1 8 SPLIT 65850 1 4 1 11 RANCH 34550 1 3 2 3 CONDO 95375 2 3 2 9 RANCH 64000 1 3 2 10 TWOSTORY 62550 2 2 3 5 RANCH 87875 2 2 3 6 CONDO 79350 1 2 3 7 SPLIT 73650 1 1 4 1 CONDO 127150 1 1 4 2 TWOSTORY 105100 2 1 4 4 SPLIT 94450 1 Output Data Set Step 6: Create a Summarized Double Mapped Data Set Create a new variable, ORDER3, to prevent ORDER2 from being ordered according to the master ordering scheme for the procedure, which is based on the lowest internal values in a format group. Sort WORK.SUMMARY2 by ORDER1 and ORDER2 and use BY group processing to increment the value of ORDER3 within each subgroup of ORDER1. This step has two variations depending on the size of the primary data set and whether you want a double mapped data set that has not been summarized. One version creates an index based on the combinations of BEDROOMS and STYLE. The index can be used to perform direct access processing to avoid sorting the primary data set. If the primary data set is large, this can save computing resources. If you want a summarized double mapped data set or if the primary data set
is small, then use the following version of /* step 6a: create WORK.DBL_MAP1 to be used either */ /* directly with a procedure or merged with a */ /* small primary data set in step 7a */ proc sort data=summary2; by order1 order2; data dbl_map1;
set summary2; Notice in the resulting data set that ORDRSTAT is in descending order and that ORDER1 and ORDER2 are in ascending numeric order. Also notice that ORDER3 is in ascending numeric order within each subgroup of BEDROOMS. Thus, the proper assignment of sequence numbers similar to those in Figure 2 has been accomplished. WORK.DBL_MAP1 after Step 6a ORDER1 BEDROOMS ORDER2 ORDER3 STYLE ORDRSTAT _FREQ_ 1 4 1 1 CONDO 127150 1 1 4 2 2 TWOSTORY 105100 2 1 4 4 3 SPLIT 94450 1 2 3 5 1 RANCH 87875 2 2 3 6 2 CONDO 79350 1 2 3 7 3 SPLIT 73650 1 3 2 3 1 CONDO 95375 2 3 2 9 2 RANCH 64000 1 3 2 10 3 TWOSTORY 62550 2 4 1 8 1 SPLIT 65850 1 4 1 11 2 RANCH 34550 1 Output Data Set If you do plan to merge the double mapped data set with a large primary data set, then use the following version of step 6: /* step 6b: create WORK.DBL_MAP1 to be merged with a */ /* large primary data in step 6b */ proc sort data=summary2; by order1 order2; data dbl_map1(index=(o3_index=(bedrooms style))); set summary2; Note that creating and maintaining an index file can consume significant
resources. Step 6b should only be used if the primary data set is large and
thus more costly to sort than it would be to create an index file on the
summarized data set. Step 7: Create a Double Mapped Data Set That Is Not
Summarized (Optional) This step is useful only when you need to use variables in
the primary data set that were dropped in the double mapping process. For
example, you may wish to list the values of SQFEET and BATHS in your report.
Since these variables were not used in the double mapping process, their
values /* step 7a: create a double mapped data set that has */ /* not been summarized using WORK.DBL_MAP1 */ /* from step 6a as input */ proc sort data=dbl_map1(drop=_freq_); by bedrooms style; proc sort data=sasuser.houses out=houses; by bedrooms style; data dbl_map2; merge dbl_map1 houses; If the primary data set is large, then use the following version of step 7, which inputs the WORK.DBL_MAP1 data set created by step 6b: /* step 7b: create a double mapped data set that has */ /* not been summarized using WORK.DBL_MAP1 */ /* from step 6b as input */
data dbl_map2; Notice in step 7a that both WORK.DBL_MAP1 and SASUSER.HOUSES must be sorted before merging. Sorting is not necessary in step 7b because the KEY= option performs direct access processing and does not require sorting. For more information on the KEY= option, refer to SAS Technical Report P-222. WORK.DBL_MAP2 after Step 7a ORDER1 BEDROOMS ORDER2 ORDER3 STYLE ORDRSTAT PRICE SQFEET BATHS 4 1 11 2 RANCH 34550 $34,550 720 1.0 4 1 8 1 SPLIT 65850 $65,850 1190 1.0 3 2 3 1 CONDO 95375 $80,050 1400 1.5 3 2 3 1 CONDO 95375 $110,700 1860 2.0 3 2 9 2 RANCH 64000 $64,000 1250 1.0 3 2 10 3 TWOSTORY 62550 $55,850 1040 1.0 3 2 10 3 TWOSTORY 62550 $69,250 1240 1.0 2 3 6 2 CONDO 79350 $79,350 1390 2.5 2 3 5 1 RANCH 87875 $86,650 1500 3.0 2 3 5 1 RANCH 87875 $89,100 1535 3.0 2 3 7 3 SPLIT 73650 $73,650 1305 1.5 1 4 1 1 CONDO 127150 $127,150 2105 2.5 1 4 4 3 SPLIT 94450 $94,450 1615 3.0 1 4 2 2 TWOSTORY 105100 $107,250 1810 3.0 1 4 2 2 TWOSTORY 105100 $102,950 1745 2.5 Output Data Set Producing Reports with a Double Mapped Data Set Once a data set is double mapped, there are many reporting possibilities. The following sections illustrate three types of reports, list-style, multicolumn, and top-rank, that can be created using a double mapped data set as input. List-style Report Figure 3 Structure of a List-style Report The following example demonstrates the creation of a list-style report using
WORK.DBL_MAP2 as input:
/* OUTPUT 7 */ column order1 order3 order2 price sqfeet baths; define order1 / "BEDROOMS" group format=ordr1fmt8.
order=internal;
order=internal;
order=internal left; Notice in Output 7 that BEDROOMS and STYLE are both listed by descending values of PRICE. SQFEET and BATHS are listed to demonstrate that variables not used in the double mapping process can still be displayed in the final report. BEDROOMS STYLE PRICE SQFEET BATHS
4 CONDO $127,150 2105 2.5
TWOSTORY $105,100 1778 2.8
SPLIT $94,450 1615 3.0
-------- -------- ------ ------
4 $107,950 1819 2.8
3 RANCH $87,875 1518 3.0
CONDO $79,350 1390 2.5
SPLIT $73,650 1305 1.5
-------- -------- ------ ------
3 $82,188 1433 2.5
2 CONDO $95,375 1630 1.8
RANCH $64,000 1250 1.0
TWOSTORY $62,550 1140 1.0
-------- -------- ------ ------
2 $75,970 1358 1.3
1 SPLIT $65,850 1190 1.0
RANCH $34,550 720 1.0
-------- -------- ------ ------
1 $50,200 955 1.0
Output 7 List-style Report
Multicolumn Report Figure 4 Structure of a Multicolumn Report The following demonstrates how to generate a multicolumn report with PROC REPORT using WORK.DBL_MAP2 as input: /* OUTPUT 8 */ proc report data=dbl_map2 nowindows spacing=1; column order3 order1,("--" order2 ordrstat); define order3 / group order=internal noprint; define order1 / across order=internal format=ordr1fmt.
"--/BEDROOMS/--";
define order2 / mean left format=ordr2fmt.
"STYLE/--";
define ordrstat / mean format=blank.
"PRICE/--";
rbreak after / dol summarize; compute after;
_c2_ = .;
_c4_ = .;
_c6_ = .;
_c8_ = .;
endcomp; run; BEDROOMS 4 3 2 1 STYLE PRICE STYLE PRICE STYLE PRICE STYLE PRICE
CONDO $127,150 RANCH $87,875 CONDO $95,375 SPLIT $65,850
TWOSTORY $105,100 CONDO $79,350 RANCH $64,000 RANCH $34,550
SPLIT $94,450 SPLIT $73,650 TWOSTORY $62,550
======== ======== ======== ======== ======== ======== ======== ========
$107,950 $82,188 $75,970 $50,200
Notice that ORDER2 is used as an ANALYSIS variable defined with a MEAN
statistic. This is necessary because only In the COMPUTE AFTER block, certain columns are set to missing. These
columns represent the different STYLE columns in the report. Since ORDER2 is
defined as an ANALYSIS variable, PROC REPORT summarizes its values on the
RBREAK line. Since these values have no meaning on the RBREAK line, they are
reassigned to missing. The missing Top-Rank Reports Figure 5 Structure of a Top-Rank Report The following example demonstrates a top-rank report obtained with PROC REPORT using the summarized WORK.DBL_MAP1 output in step 6a: /* OUTPUT 9 */ proc report data=dbl_map1 nowindows spacing=2; where (order3 <= 3); freq _freq_; column ("--/BEDROOMS/--" order1 ordrstat=tot) order3,("--" order2 ordrstat); define order1 / group format=ordr1fmt. order=internal
"#/-";
define tot / mean format=blank9.
"AVG PRICE/--";
define order3 / across format=rank. order=internal
"TOP 3 STYLES/--";
define order2 / mean format=ordr2fmt. left
"STYLE/--";
define ordrstat / mean format=blank. spacing=1
"PRICE/--";
run; BEDROOMS 1st 2nd 3rd # AVG PRICE STYLE PRICE STYLE PRICE STYLE PRICE 4 $107,950 CONDO $127,150 TWOSTORY $105,100 SPLIT $94,450 3 $82,188 RANCH $87,875 CONDO $79,350 SPLIT $73,650 2 $75,970 CONDO $95,375 RANCH $64,000 TWOSTORY $62,550 1 $50,200 SPLIT $65,850 RANCH $34,550 Notice that this report uses summarized WORK.DBL_MAP1. This highlights the fact it is not always necessary to merge the double mapped data set with the primary data. Be sure to use _FREQ_ in a FREQ statement when using a summarized double mapped data set; if this is omitted, the overall mean values for BEDROOMS would reflect the mean of the values in the summarized WORK.DBL_MAP1 and not the actual means that would have been obtained had SASUSER.HOUSES been used. Also, associate the RANK. format with ORDER3. The format is designed to give meaningful descriptions to the sequence values displayed by ORDER3. Output 9 shows the top-rank report generated. Because ORDER3 is actually displayed in the output, PROC TABULATE can also be used to generate a top-rank report. The following example demonstrates how to obtain this report with the TABULATE procedure: /* OUTPUT 10 */ proc tabulate data=dbl_map1 format=blank. noseps; options label; where (order3<=3); freq _freq_; class order1 order3; var order2 ordrstat; table order1=' ', ordrstat*mean
order3*(order2*mean*f=ordr2fmt. ordrstat*mean)
/ rts=7 box='COUNT OF +---+ BED ROOMS';
format order1 ordr1fmt. order3 rank.; label
order2 ='STYLE'
order3 ='TOP 3 STYLES'
ordrstat='PRICE';
keylabel
mean =' ';
run; |COUNT| | TOP 3 STYLES | |OF | |-----------------------------------------------------| |+---+| | 1st | 2nd | 3rd | |BED | |-----------------+-----------------+-----------------| |ROOMS| PRICE | STYLE | PRICE | STYLE | PRICE | STYLE | PRICE | |-----+--------+--------+--------+--------+--------+--------+--------| |4 |$107,950|CONDO |$127,150|TWOSTORY|$105,100|SPLIT | $94,450| |3 | $82,188|RANCH | $87,875|CONDO | $79,350|SPLIT | $73,650| |2 | $75,970|CONDO | $95,375|RANCH | $64,000|TWOSTORY| $62,550| |1 | $50,200|SPLIT | $65,850|RANCH | $34,550| | |Output 10 Top-Rank Report Generated with PROC TABULATE
Simplifying the Double Mapping Process /* alternative approach to steps 5 and 6 */ proc sql; create view summary2 as select c1.order1, c1.bedrooms, c2.order2, c2.style, c2.ordrstat from control1 c1, control2 c2 where c1.bedrooms=c2.bedrooms order by c1.order1, c2.order2; quit; data dbl_map1; set summary2; by order1 order2; if first.order1 then order3=0; order3+1; run; The above example uses one procedure step and one DATA step as opposed to three procedure steps and two DATA steps as previously demonstrated. There are other points in the process that may also be consolidated. Experienced programmers are encouraged to consolidate steps where they see fit. The process can also be generalized for any data set using the macro facility. Because the only changes that would be needed are the data set and variable names, the macro facility can be used to make the substitutions so that users unfamiliar with the process can still double map a data set. You could even provide users with a choice of reports to generate. A detailed explanation of the SQL procedure and the macro facility are beyond the scope of this article. However, their use is highly recommended. For more information on PROC SQL, see the SAS Guide to the SQL Procedure, Usage and Reference, Version 6, First Edition. For more information on the macro facility, see the SAS Guide to Macro Processing, Version 6, Second Edition. Conclusion SAS, SAS/ASSIST, and Observations are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. (R) indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. Appendix The following OPTIONS and TITLE statements are used with all examples in this article that do not explicitly contain other settings: options nodate nonumber nolabel ls=80 missing=' '; title; The following is the SASUSER.HOUSES data set, which is used to supply the data for most examples in this article: SASUSER.HOUSES STYLE SQFEET BEDROOMS BATHS STREET PRICE RANCH 1250 2 1.0 Sheppard Avenue $64,000 SPLIT 1190 1 1.0 Rand Street $65,850 CONDO 1400 2 1.5 Market Street $80,050 TWOSTORY 1810 4 3.0 Garris Street $107,250 RANCH 1500 3 3.0 Kemble Avenue $86,650 SPLIT 1615 4 3.0 West Drive $94,450 SPLIT 1305 3 1.5 Graham Avenue $73,650 CONDO 1390 3 2.5 Hampshire Avenue $79,350 TWOSTORY 1040 2 1.0 Sanders Road $55,850 CONDO 2105 4 2.5 Jeans Avenue $127,150 RANCH 1535 3 3.0 State Highway $89,100 TWOSTORY 1240 2 1.0 Fairbanks Circle $69,250 RANCH 720 1 1.0 Nicholson Drive $34,550 TWOSTORY 1745 4 2.5 Highland Road $102,950 CONDO 1860 2 2.0 Arcata Avenue $110,700 Output Data Set |