Usage Note 1566: Why duplicate observations occur when using PROC SORT with the NODUPRECS option
A common misconception is that the PROC SORT option NODUPRECS (aliased as NODUP) compares each observation in a data set with every other observation in order to eliminate duplicate observations.
In fact, the NODUPRECS option causes PROC SORT to compare all variable values for an observation only to the previous observation written to the output data set.
If you use the NODUPRECS option, you must sort the data set by enough of the variables to ensure that the observations are in the correct order to remove all duplicates.
To reliably remove all duplicate observations, use the NODUPKEY option and sort the data set by all variables (BY _ALL_;).
Alternatively, use PROC SQL with a SELECT DISTINCT statement.
Operating System and Release Information
SAS System | Base SAS | z/OS | 9.3 TS1M2 | |
Microsoft® Windows® for x64 | 9.3 TS1M2 | |
64-bit Enabled AIX | 9.3 TS1M2 | |
64-bit Enabled HP-UX | 9.3 TS1M2 | |
64-bit Enabled Solaris | 9.3 TS1M2 | |
HP-UX IPF | 9.3 TS1M2 | |
Linux for x64 | 9.3 TS1M2 | |
Solaris for x64 | 9.3 TS1M2 | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
A more reliable way to eliminate duplicate observations is to use the NODUPKEY option.
Type: | Usage Note |
Priority: | |
Topic: | Common Programming Tasks ==> Sorting Data
|
Date Modified: | 2000-01-06 08:55:02 |
Date Created: | 2000-01-06 08:55:02 |