SQL SERVER – 2005 – 2008 – Delete Duplicate Rows

This method is improved  method as it not only uses CTE and ROW_NUMBER, but also demonstrates the power of CTE with DELETE statement. We will have a comprehensive discussion about it later in this article. For now, let us first create a sample table from which we will delete records.

/* Create Table with 7 entries - 3 are duplicate entries */
CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 --duplicate
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO

The above table has total 7 records, out of which 3 are duplicate records. Once the duplicates are removed we will have only 4 records left.

/* It should give you 7 rows */
SELECT *
FROM DuplicateRcordTable
GO

The most interesting part of this is yet to come. We will use CTE that will re-generate the same table with additional column, which is row number. In our case, we have Col1 and Col2 and both the columns qualify as duplicate rows. It may be a different set of rows for each different query like this. Another point to note here is that once CTE is created  DELETE statement can be run on it. We will put a condition here – when we receive more than onerows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.

/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) ASDuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO

It is apparent that after delete command has been run, we will have only 4 records, which is almost the same result which we would have got with DISTINCT, with this resultset. If we had more than 2 columns and we had to run unique on only two columns, our distinct might have not worked here . In this case, we would have to use above the mentioned method.

/* It should give you Distinct 4 records */
SELECT *
FROM DuplicateRcordTable
GO

This method is a breeze and we can use this for SQL Server version 2005 and the laterversions.

=====================================================

If we have to delete duplicates more than two columns

DELETE  SUB
FROM    ( SELECT    ROW_NUMBER() OVER ( PARTITION BY Col1,
                                        Col2, Col3 ORDER BY Col1 ) cnt
          FROM      DuplicateRcordTable
        ) SUB
WHERE   SUB.Cnt > 1
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s