1566 - Why duplicate observations occur when using PROC SORT with the NODUPRECS option

Usage Note 1566: Why duplicate observations occur when using PROC SORT with the NODUPRECS option

A common misconception is that the PROC SORT option NODUPRECS (aliased as NODUP) compares each observation in a data set with every other observation in order to eliminate duplicate observations.

In fact, the NODUPRECS option causes PROC SORT to compare all variable values for an observation only to the previous observation written to the output data set.
If you use the NODUPRECS option, you must sort the data set by enough of the variables to ensure that the observations are in the correct order to remove all duplicates.

To reliably remove all duplicate observations, use the NODUPKEY option and sort the data set by all variables (BY _ALL_;).
Alternatively, use PROC SQL with a SELECT DISTINCT statement.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	Base SAS	z/OS	9.3 TS1M2
		Microsoft® Windows® for x64	9.3 TS1M2
		64-bit Enabled AIX	9.3 TS1M2
		64-bit Enabled HP-UX	9.3 TS1M2
		64-bit Enabled Solaris	9.3 TS1M2
		HP-UX IPF	9.3 TS1M2
		Linux for x64	9.3 TS1M2
		Solaris for x64	9.3 TS1M2

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

A more reliable way to eliminate duplicate observations is to use the NODUPKEY option.

Type:	Usage Note
Priority:
Topic:	Common Programming Tasks ==> Sorting Data

Date Modified:	2000-01-06 08:55:02
Date Created:	2000-01-06 08:55:02

Support

Usage Note 1566: Why duplicate observations occur when using PROC SORT with the NODUPRECS option

Operating System and Release Information