Previous Page | Next Page

Practical Problem-Solving with PROC SQL

Counting Duplicate Rows in a Table


Problem

You want to count the number of duplicate rows in a table and generate an output column that shows how many times each row occurs.


Background Information

There is one input table, called DUPLICATES, that contains the following data:

Sample Input Table for Counting Duplicates

                      Sample Data for Counting Duplicates

                                 First
              Obs    LastName    Name        City        State

               1     Smith       John        Richmond    Virginia
               2     Johnson     Mary        Miami       Florida 
               3     Smith       John        Richmond    Virginia
               4     Reed        Sam         Portland    Oregon  
               5     Davis       Karen       Chicago     Illinois
               6     Davis       Karen       Chicago     Illinois
               7     Thompson    Jennifer    Houston     Texas   
               8     Smith       John        Richmond    Virginia
               9     Johnson     Mary        Miami       Florida 

Solution

Use the following PROC SQL code to count the duplicate rows:

proc sql;
   title 'Duplicate Rows in DUPLICATES Table';
   select *, count(*) as Count
      from Duplicates
      group by LastName, FirstName, City, State
      having count(*) > 1;
     

PROC SQL Output for Counting Duplicates

                       Duplicate Rows in DUPLICATES Table

               LastName  FirstName  City      State        Count
               -------------------------------------------------
               Davis     Karen      Chicago   Illinois         2
               Johnson   Mary       Miami     Florida          2
               Smith     John       Richmond  Virginia         3

How It Works

This solution uses a query that

Note:   You must include all of the columns in your table in the GROUP BY clause to find exact duplicates.  [cautionend]

Previous Page | Next Page | Top of Page