Practical Problem-Solving with PROC SQL |
Problem |
You want to count the number of duplicate rows in a table and generate an output column that shows how many times each row occurs.
Background Information |
There is one input table, called DUPLICATES, that contains the following data:
Sample Input Table for Counting Duplicates
Sample Data for Counting Duplicates First Obs LastName Name City State 1 Smith John Richmond Virginia 2 Johnson Mary Miami Florida 3 Smith John Richmond Virginia 4 Reed Sam Portland Oregon 5 Davis Karen Chicago Illinois 6 Davis Karen Chicago Illinois 7 Thompson Jennifer Houston Texas 8 Smith John Richmond Virginia 9 Johnson Mary Miami Florida
Solution |
Use the following PROC SQL code to count the duplicate rows:
proc sql; title 'Duplicate Rows in DUPLICATES Table'; select *, count(*) as Count from Duplicates group by LastName, FirstName, City, State having count(*) > 1;
PROC SQL Output for Counting Duplicates
Duplicate Rows in DUPLICATES Table LastName FirstName City State Count ------------------------------------------------- Davis Karen Chicago Illinois 2 Johnson Mary Miami Florida 2 Smith John Richmond Virginia 3
How It Works |
This solution uses a query that
selects all columns
counts all rows
groups all of the rows in the Duplicates table by matching rows
excludes the rows that have no duplicates.
Note: You must include all of the columns in your table in the GROUP BY clause to find exact duplicates.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.