Usage Note 1566: Why duplicate observations occur when using the NODUP SORT option
A common misconception of the NODUP SORT option is that it will
compare each observation in a dataset with every other observation in
order to eliminate duplicate observations.
The NODUP option causes PROC SORT to compare all variable values
for an observation to the PREVIOUS one written to the output data set.
Therefore, when using the NODUP option the dataset must be sorted
by enough variables to ensure that the observations are in the correct
order to remove all duplicates.
The following example shows how a duplicate observation can be written
to the output dataset when using the NODUP option and the dataset has
not been sorted by enough variables:
X Y
1 1
1 2
1 1
If the dataset is only sorted by X then the SORT procedure will write
the first record out, compare the second record to the first and write
the second out because it is not the same as the first. It will then
compare the third record to the second and write it to the output
dataset because it is not the same as the second. You now have
duplicate observations on the output dataset because observation 1 and
observation 3 are the same. If you sort by both X and Y then the data
would be in the following order and the duplicate observation would not
have occurred:
X Y
1 1
1 1
1 2
Operating System and Release Information
| SAS System | Base SAS | Microsoft Windows NT Workstation | 8 TS M0 | |
| Microsoft Windows 95/98 | 8 TS M0 | |
| z/OS | 8 TS M0 | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
| Type: | Usage Note |
| Priority: | |
| Topic: | Common Programming Tasks ==> Sorting Data
|
| Date Modified: | 2000-01-06 08:55:02 |
| Date Created: | 2000-01-06 08:55:02 |