删除表中的多个重复行

时间:2022-08-24 21:30:01

I'm sure this has been asked before, but I was having a hard time finding it.

我确信以前曾经问过,但我很难找到它。

I have multiple groups of duplicates in one table (3 records for one, 2 for another, etc) - multiple rows where more than 1 exists.

我在一个表中有多组重复项(3个记录为一个,2个为另一个,等等) - 存在多个行的多行。

Below is what I came up with to delete them, but I have to run the script for however many duplicates there are:

下面是我想出来删除它们,但我必须运行脚本,但有许多重复项:

set rowcount 1
delete from Table
where code in (
  select code from Table 
  group by code
  having (count(code) > 1)
)
set rowcount 0

This works well to a degree. I need to run this for every group of duplicates, and then it only deletes 1 (which is all I need right now).

这在某种程度上很有效。我需要为每组重复项运行它,然后它只删除1(这就是我现在所需要的)。

I appreciate your help/comments!

感谢您的帮助/评论!

4 个解决方案

#1


7  

If you have a key column on the table, then you can use this to uniquely identify the "distinct" rows in your table.

如果表上有一个键列,则可以使用它来唯一标识表中的“不同”行。

Just use a sub query to identify a list of ID's for unique rows and then delete everything outside of this set. Something along the lines of.....

只需使用子查询来标识唯一行的ID列表,然后删除此集之外的所有内容。有些东西......

create table #TempTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData3')
insert into #TempTable(SomeData) values('someData4')

select * from #TempTable

--Records to be deleted
SELECT ID
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Delete them
DELETE
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Final Result Set
select * from #TempTable

drop table #TempTable;

Alternatively you could use a CTE for example:

或者,你可以使用CTE例如:

WITH UniqueRecords AS
(
    select MAX(ID) AS ID
    from #TempTable
    group by SomeData
)
DELETE A
FROM #TempTable A
    LEFT outer join UniqueRecords B on
        A.ID = B.ID
WHERE B.ID IS NULL

#2


2  

It is frequently more efficient to copy unique rows into temporary table,
drop source table, rename back temporary table.

将唯一行复制到临时表,删除源表,重命名临时表通常更有效。

I reused the definition and data of #TempTable, called here as SrcTable instead, since it is impossible to rename temporary table into a regular one)

我重用#TempTable的定义和数据,这里称为SrcTable,因为不可能将临时表重命名为常规表。

create table SrcTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData3')
insert into SrcTable(SomeData) values('someData4')

by John Sansom in previous answer

作者:John Sansom在之前的回答中

-- cloning "unique" part
SELECT * INTO TempTable 
FROM SrcTable --original table
WHERE id IN  
(SELECT MAX(id) AS ID
FROM SrcTable
GROUP BY SomeData);
GO;

DROP TABLE SrcTable
GO;

sys.sp_rename 'TempTable', 'SrcTable'

#3


1  

You can alternatively use ROW_NUMBER() function to filter out duplicates

您也可以使用ROW_NUMBER()函数来过滤掉重复项

;WITH [CTE_DUPLICATES] AS 
(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY SomeData ORDER BY SomeData)
FROM #TempTable
) 
DELETE FROM [CTE_DUPLICATES] WHERE RN > 1

#4


0  

SET ROWCOUNT 1    
DELETE Table    
FROM Table a    
WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1    
WHILE @@rowcount > 0    
  DELETE Table    
  FROM Table a    
  WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1    
SET ROWCOUNT 0

this will delete all duplicate rows, But you can add attributes if you want to compare according to them .

这将删除所有重复的行,但如果要根据它们进行比较,则可以添加属性。

#1


7  

If you have a key column on the table, then you can use this to uniquely identify the "distinct" rows in your table.

如果表上有一个键列,则可以使用它来唯一标识表中的“不同”行。

Just use a sub query to identify a list of ID's for unique rows and then delete everything outside of this set. Something along the lines of.....

只需使用子查询来标识唯一行的ID列表,然后删除此集之外的所有内容。有些东西......

create table #TempTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData3')
insert into #TempTable(SomeData) values('someData4')

select * from #TempTable

--Records to be deleted
SELECT ID
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Delete them
DELETE
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Final Result Set
select * from #TempTable

drop table #TempTable;

Alternatively you could use a CTE for example:

或者,你可以使用CTE例如:

WITH UniqueRecords AS
(
    select MAX(ID) AS ID
    from #TempTable
    group by SomeData
)
DELETE A
FROM #TempTable A
    LEFT outer join UniqueRecords B on
        A.ID = B.ID
WHERE B.ID IS NULL

#2


2  

It is frequently more efficient to copy unique rows into temporary table,
drop source table, rename back temporary table.

将唯一行复制到临时表,删除源表,重命名临时表通常更有效。

I reused the definition and data of #TempTable, called here as SrcTable instead, since it is impossible to rename temporary table into a regular one)

我重用#TempTable的定义和数据,这里称为SrcTable,因为不可能将临时表重命名为常规表。

create table SrcTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData3')
insert into SrcTable(SomeData) values('someData4')

by John Sansom in previous answer

作者:John Sansom在之前的回答中

-- cloning "unique" part
SELECT * INTO TempTable 
FROM SrcTable --original table
WHERE id IN  
(SELECT MAX(id) AS ID
FROM SrcTable
GROUP BY SomeData);
GO;

DROP TABLE SrcTable
GO;

sys.sp_rename 'TempTable', 'SrcTable'

#3


1  

You can alternatively use ROW_NUMBER() function to filter out duplicates

您也可以使用ROW_NUMBER()函数来过滤掉重复项

;WITH [CTE_DUPLICATES] AS 
(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY SomeData ORDER BY SomeData)
FROM #TempTable
) 
DELETE FROM [CTE_DUPLICATES] WHERE RN > 1

#4


0  

SET ROWCOUNT 1    
DELETE Table    
FROM Table a    
WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1    
WHILE @@rowcount > 0    
  DELETE Table    
  FROM Table a    
  WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1    
SET ROWCOUNT 0

this will delete all duplicate rows, But you can add attributes if you want to compare according to them .

这将删除所有重复的行,但如果要根据它们进行比较,则可以添加属性。