Sample 24626: Identify duplicate and nonduplicate observations in a data set and write each to separate data sets
Use FIRST.variable and LAST.variable (made available with BY-Group processing) to determine if a BY-Group contains unique or duplicate observations.
Note: Beginning with SAS 9.1, the DUPOUT= option on a PROC SORT statement specifies an output data set that contains duplicate observations. The new data set will not contain *all* the duplicates, but one record for each duplicate BY value. See the Base SAS(R) 9.1 Procedures Guide for more details.
proc sort data=dsn dupout=new_dsn nodupkey;
by var;
run;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
data clasdata ;
input id name $ class $ ;
datalines;
4567 Denise ENGL201
4567 Denise ENGL201
1234 Lynn CHEM101
1234 Lynn CHEM101
1234 Lynn MATH102
5678 Rick CHEM101
5678 Rick HIST300
5678 Rick HIST300
3456 Amber CHEM101
3456 Amber Math102
3456 Amber Math102
2345 Ginny CHEM101
2345 Ginny ENGL201
2345 Ginny MATH102
;
/* Sort CLASDATA by NAME and CLASS */
proc sort ;
by name class ;
run;
data dups nodups ;
set clasdata ;
by name class ;
/* If the combination of NAME and CLASS is in the data set once, */
/* output NODUPS, else output DUPS. */
if first.class and last.class then output nodups ;
else output dups ;
run;
proc print data=dups;
title 'RESULTS of DUPS data set';
run;
proc print data=nodups;
title 'RESULTS of NODUPS data set';
run;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
RESULTS of DUPS data set
Obs id name class
1 3456 Amber Math102
2 3456 Amber Math102
3 4567 Denise ENGL201
4 4567 Denise ENGL201
5 1234 Lynn CHEM101
6 1234 Lynn CHEM101
7 5678 Rick HIST300
8 5678 Rick HIST300
RESULTS of NODUPS data set
Obs id name class
1 3456 Amber CHEM101
2 2345 Ginny CHEM101
3 2345 Ginny ENGL201
4 2345 Ginny MATH102
5 1234 Lynn MATH102
6 5678 Rick CHEM101
Use FIRST.variable and LAST.variable made available with
BY-Group processing.
| Type: | Sample |
| Topic: | SAS Reference ==> DATA Step Data Management ==> Manipulation and Transformation ==> BY-group processing
|
| Date Modified: | 2009-10-13 13:32:18 |
| Date Created: | 2004-09-30 14:08:59 |
Operating System and Release Information
| SAS System | Base SAS | All | n/a | n/a |