SQL Server: Delete duplicates from a table.
This article discusses how to remove duplicate rows from a table in a SQL Server database. For example consider the following “employee” table of variable type. In this table, empId is the row identifier, empName and Salary are employee details.
--Create a temporary table variable declare @employee Table(empId int identity(1,1), empName Varchar(50), Salary int) declare @empName varchar(50) declare @Salary int
Let us add a few rows in this table. Since we have not defined a unique key in this table, there is a scope for multiple records to have similar empName and Salary values. The following script will insert 7 rows into the employee table, with 3 duplicate rows. You can see from the screen capture below that the rows 5, 6 and 7 have duplicate data.
-- Insert dummy data(empName, Salary) Insert into @employee Values('test1', 1000) Insert into @employee Values('test2', 2000) Insert into @employee Values('test3', 3000) Insert into @employee Values('test4', 4000) Insert into @employee Values('test1', 1000) Insert into @employee Values('test1', 1000) Insert into @employee Values('test3', 3000)
There are many approaches for deleting the duplicate data in a table. I am about to demonstrate one of the ways using a cursor to de-duplicate the table. Following is the process of de-duplication:
1. The first step of the process is to identify the data that is redundant.
2. Followed by fetching all the identifiers of each of the distinct set of data.
3. Then delete all but one of the rows identified by the primary keys that was fetched in the last step.
4. Repeat the steps 2 and 3 on each of the unique set of data from step 1 using the cursor.
-- delete the duplicates DECLARE my_cursor CURSOR FOR select distinct a.empName, a.Salary from @employee a where 1 < (select count(*) from @employee b where a.empName = b.empName and a.Salary = b.Salary) OPEN my_cursor FETCH NEXT FROM my_cursor into @empName, @Salary WHILE @@FETCH_STATUS = 0 BEGIN delete from @employee where empName = @empName and Salary = @Salary and empId not in (select top 1 empId from @employee where empName = @empName and Salary = @Salary) FETCH NEXT FROM my_cursor into @empName, @Salary end close my_cursor deallocate my_cursor
Following is a screen capture of the table after the duplicates have been removed:
Tip: The following script can be used to get a list of duplicate tables in a database. The duplication in this case is assumed to be caused by the Schema. The same script can be tweaked easily to identify other SQL Server duplicate objects.
select a.Table_Schema, a.Table_Name from INFORMATION_SCHEMA.TABLES a where 1 < (select count(*) from INFORMATION_SCHEMA.TABLES b where a.Table_Name = b.Table_Name)