Caveman's Blog

My commitment to learning.

SQL Server: Delete duplicates from a table.

with one comment


This article discusses how to remove duplicate rows from a table in a SQL Server database. For example consider the following “employee” table of variable type. In this table, empId is the row identifier, empName and Salary are employee details.


--Create a temporary table variable
declare @employee Table(empId int identity(1,1), empName Varchar(50), Salary int)
declare @empName varchar(50)
declare @Salary int

Let us add a few rows in this table. Since we have not defined a unique key in this table, there is a scope for multiple records to have similar empName and Salary values. The following script will insert 7 rows into the employee table, with 3 duplicate rows. You can see from the screen capture below that the rows 5, 6 and 7 have duplicate data.

-- Insert dummy data(empName, Salary)
Insert into @employee Values('test1', 1000)
Insert into @employee Values('test2', 2000)
Insert into @employee Values('test3', 3000)
Insert into @employee Values('test4', 4000)
Insert into @employee Values('test1', 1000)
Insert into @employee Values('test1', 1000)
Insert into @employee Values('test3', 3000)

There are many approaches for deleting the duplicate data in a table. I am about to demonstrate one of the ways using a cursor to de-duplicate the table. Following is the process of de-duplication:

1. The first step of the process is to identify the data that is redundant.

2. Followed by fetching all the identifiers of each of the distinct set of data.
3. Then delete all but one of the rows identified by the primary keys that was fetched in the last step.
4. Repeat the steps 2 and 3 on each of the unique set of data from step 1 using the cursor.

-- delete the duplicates
DECLARE my_cursor CURSOR
FOR select distinct a.empName, a.Salary from  @employee a
where 1 < (select count(*) from @employee b where a.empName = b.empName and a.Salary = b.Salary)

OPEN my_cursor
FETCH NEXT FROM my_cursor into @empName, @Salary

WHILE @@FETCH_STATUS = 0
BEGIN

delete from @employee where empName = @empName and Salary = @Salary and empId not in
(select top 1 empId from @employee where empName = @empName and Salary = @Salary)

FETCH NEXT FROM my_cursor into @empName, @Salary
end

close my_cursor
deallocate my_cursor

Following is a screen capture of the table after the duplicates have been removed:

Tip: The following script can be used to get a list of duplicate tables in a database. The duplication in this case is assumed to be caused by the Schema. The same script can be tweaked easily to identify other SQL Server duplicate objects.

select a.Table_Schema, a.Table_Name from  INFORMATION_SCHEMA.TABLES a
where 1 < (select count(*) from INFORMATION_SCHEMA.TABLES b where a.Table_Name = b.Table_Name)
Advertisements

One Response

Subscribe to comments with RSS.

  1. sql find duplicates values in table

    -sara
    sql find duplicates values in table

    flashflex

    July 19, 2011 at 2:06 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: