37581 - How can I eliminate duplicate observations from a large data set without sorting

Usage Note 37581: How can I eliminate duplicate observations from a large data set without sorting

Using a CLASS statement in PROC SUMMARY does not require the data set to be sorted in advance. The CLASS statement will collapse observations with the same variable values. The _FREQ_ variable in the output data set shows the frequency count of observations with that combination of CLASS variable values. Click on the Results tab to see the resulting data set.

    /* Example */

    /* Create a data set with duplicate observations */
    data test;
    input x y z;
    cards;
    1 1 1
    1 1 1
    1 2 1
    1 2 2
    2 2 2
    2 2 2
    2 2 2
    2 2 1
    ;
    run;
    
    proc summary data=test nway;
    class x y z;
    output out=test1(drop=_type_);
    run;

    proc print data=test1;
    run;

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	Base SAS	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 7
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2008
		Microsoft Windows XP Professional
		Windows Millennium Edition (Me)
		Windows Vista
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Obs    x    y    z    _FREQ_

 1     1    1    1       2
 2     1    2    1       1
 3     1    2    2       1
 4     2    2    1       1
 5     2    2    2       3

Date Modified:	2009-10-26 14:25:20
Date Created:	2009-10-26 08:30:04

Type:	Usage Note
Priority:

Support

Usage Note 37581: How can I eliminate duplicate observations from a large data set without sorting

Operating System and Release Information