| Practical Problem-Solving with PROC SQL |
| Problem |
You want to count the number of duplicate rows in a table and generate an output column that shows how many times each row occurs.
| Background Information |
There is one input table, called DUPLICATES, that contains the following data:
Sample Input Table for Counting Duplicates
Sample Data for Counting Duplicates
First
Obs LastName Name City State
1 Smith John Richmond Virginia
2 Johnson Mary Miami Florida
3 Smith John Richmond Virginia
4 Reed Sam Portland Oregon
5 Davis Karen Chicago Illinois
6 Davis Karen Chicago Illinois
7 Thompson Jennifer Houston Texas
8 Smith John Richmond Virginia
9 Johnson Mary Miami Florida
| Solution |
Use the following PROC SQL code to count the duplicate rows:
proc sql; title 'Duplicate Rows in DUPLICATES Table'; select *, count(*) as Count from Duplicates group by LastName, FirstName, City, State having count(*) > 1;
PROC SQL Output for Counting Duplicates
Duplicate Rows in DUPLICATES Table
LastName FirstName City State Count
-------------------------------------------------
Davis Karen Chicago Illinois 2
Johnson Mary Miami Florida 2
Smith John Richmond Virginia 3
| How It Works |
This solution uses a query that
selects all columns
counts all rows
groups all of the rows in the Duplicates table by matching rows
excludes the rows that have no duplicates.
Note: You must include all of the columns in your table
in the GROUP BY clause to find exact duplicates. ![[cautionend]](../../../../common/61991/HTML/default/images/cautend.gif)
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.