SORT Procedure

Example 4: Retaining the First Observation of Each BY Group

Features:

PROC SORT statement option: : NODUPKEY

BY statement

Other features:

PROC PRINT

Data set: Account
Note: The EQUALS option, which is the default, must be in effect to ensure that the first observation for each BY group is the one that is retained by the NODUPKEY option. If the NOEQUALS option has been specified, then one observation for each BY group will still be retained by the NODUPKEY option, but not necessarily the first observation.

Details

In this example, PROC SORT creates an output data set that contains only the first observation of each BY group. The NODUPKEY option prevents an observation from being written to the output data set when its BY value is identical to the BY value of the last observation written to the output data set. The resulting report contains one observation for each town where the businesses are located.

Program

proc sort data=account out=towns nodupkey;
   by town;
run;
proc print data=towns;
   var town company debt accountnumber;
   title 'Towns of Customers with Past-Due Accounts';
run;

Program Description

Create the output data set TOWNS but include only the first observation of each BY group. NODUPKEY writes only the first observation of each BY group to the new data set TOWNS. If you use the VMS operating environment sort, then the observation that is written to the output data set is not always the first observation of the BY group.
proc sort data=account out=towns nodupkey;
Sort by one variable. The BY statement specifies that observations should be ordered by town.
   by town;
run;
Print the output data set TOWNS. PROC PRINT prints the data set TOWNS.
proc print data=towns;
Specify the variables to be printed. The VAR statement specifies the variables to be printed and their column order in the output.
   var town company debt accountnumber;
Specify the title.
   title 'Towns of Customers with Past-Due Accounts';
run;

Output

The output data set contains only four observations, one for each town in the input data set.
Towns of Customers with Past-Due Accounts