从表中删除重复记录 - SQL查询

时间:2022-08-01 15:40:49

I need to delete duplicate rows only from the table, like I have 3 duplicate rows in the table, my query will delete 2 rows from 3 duplicated rows.

我只需要从表中删除重复的行,就像我在表中有3个重复的行,我的查询将从3个重复的行中删除2行。

How can I get this? Please help me.

我怎么能得到这个?请帮帮我。

7 个解决方案

#1


12  

Please try the below query, it will definitely meet your objective

请尝试以下查询,它一定会符合您的目标

SET ROWCOUNT 1
DELETE test
FROM test a
WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1
WHILE @@rowcount > 0
  DELETE test
  FROM test a
  WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1
SET ROWCOUNT 0

where test is your table name

test是你的表名

#2


4  

This works in SQL Server although it isn't a single statement:

这在SQL Server中有效,尽管它不是单个语句:

Declare @cnt int; 
Select @cnt=COUNT(*) From DupTable Where (Col1=1);  -- Assumes you are trying to delete the duplicates where some condition (e.g. Col1=1) is true.
Delete Top (@cnt-1) From DupTable

It also doesn't require any extra assumptions (like the existance of another column that makes each row unique). After all, Santanu did say that the rows were duplicates and not just the one column.

它也不需要任何额外的假设(比如存在使每行唯一的另一列的存在)。毕竟,Santanu确实说这些行是重复的而不仅仅是一列。

However, the right answer, in my view, is to get a real table structure. That is, add an IDENTITY column to this table so that you can use a single SQL command to do your work. Like this:

但是,在我看来,正确的答案是获得一个真正的表结构。也就是说,在此表中添加IDENTITY列,以便您可以使用单个SQL命令来完成工作。喜欢这个:

ALTER TABLE dbo.DupTable ADD
    IDCol int NOT NULL IDENTITY (1, 1)
GO

Then the delete is trivial:

然后删除是微不足道的:

DELETE FROM DupTable WHERE IDCol NOT IN 
   (SELECT MAX(IDCol) FROM DupTable GROUP BY Col1, Col2, Col3)

#3


3  

DELETE FROM Table t1, Table t2 WHERE t1.colDup = t2.colDup AND t1.date < t2.date

Will delete every duplicate row from Table (on column colDup) except the oldest (i.e. lowset date).

将删除表中的每个重复行(在列colDup上),除了最旧的(即低设置日期)。

#4


2  

DELETE FROM `mytbl`
    INNER JOIN (
        SELECT 1 FROM `mytbl`
        GROUP BY `duplicated_column` HAVING COUNT(*)=2
    ) USING(`id`)

Edit:

编辑:

My bad, the above query won't work.

我的不好,上面的查询将无法正常工作。

Assuming table structure:

假设表结构:

id int auto_increment

id int auto_increment

num int # <-- this is the column with duplicated values

num int#< - 这是具有重复值的列

The following query would work in MySQL (i checked):

以下查询可以在MySQL中工作(我检查过):

DELETE `mytbl` FROM `mytbl` 
    INNER JOIN 
    (
        SELECT `num` FROM `mytbl`
        GROUP BY `num` HAVING COUNT(*)=2
    ) AS `tmp` USING (`num`)

The query would delete the rows that have 2 (not more or else) duplicated values in the num column.

该查询将删除num列中具有2(不多于或其他)重复值的行。

Edit (again):

编辑(再次):

I suggest to add a key on the num column.

我建议在num列上添加一个键。

Edit(#3):

编辑(#3):

In case that the author wanted to delete the duplicated rows, the following should work for MySQL (it worked for me):

如果作者想要删除重复的行,以下应该适用于MySQL(它适用于我):

DELETE `delete_duplicated_rows` FROM `delete_duplicated_rows`
    NATURAL JOIN (
        SELECT *
        FROM `delete_duplicated_rows`
        GROUP BY `num1` HAVING COUNT(*)=2
    ) AS `der`

While assuming table structure is:

假设表结构是:

CREATE TABLE `delete_duplicated_rows` (
  `num1` tinyint(4) DEFAULT NOT NULL,
  `num2` tinyint(4) DEFAULT NOT NULL
) ENGINE=MyISAM;

#5


1  

If you have the id's of the rows you want to delete then...

如果您要删除要删除的行的ID,则...

DELETE FROM table WHERE id IN (1, 4, 7, [id numbers to delete...])

#6


1  

I think each table has unique identifier. So if it exists then you can write following query: Delete Table1 from Table1 t1 where 2 >= (select count(id) from Table1 where dupColumn = t1.dupColumn) and t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)

我认为每个表都有唯一的标识符。因此,如果它存在,那么你可以写下面的查询:从Table1 t1删除Table1,其中2> =(从表1中选择count(id),其中dupColumn = t1.dupColumn)和t1.id不在(从表1中选择max(id) dupColumn = t1.dupColumn)

OOps. It seems it is possible to use second filter only Delete Table1 from Table1 t1 where t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)

哎呀。似乎可以仅使用第二个过滤器从Table1 t1中删除Table1,其中t1.id不在(从表1中选择max(id)其中dupColumn = t1.dupColumn)

#7


1  

  -- Just to demonstrates Marks example          
    . 
        -- START === 1.0.dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable] 
        Script Date: 03/29/2010 21:24:02 ******/
      IF EXISTS (SELECT * FROM sys.objects 
     WHERE 
object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') 
AND type in (N'U'))
        DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    
Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     ) 


    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ===  1.0.dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START === 1.1..dbo..DuplicatesTable.TableInsert.sql

    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([ColA], [ColB])
    SELECT   N'ColA', N'ColB' UNION ALL
    SELECT N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO


    -- END ===  1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START === 2.0.RemoveDuplicates.Script.sql
    ALTER TABLE dbo.DuplicatesTable ADD
            DuplicatesTableId int NOT NULL IDENTITY (1, 1)
    GO

    -- Then the delete is trivial:
    DELETE FROM dbo.DuplicatesTable WHERE DuplicatesTableId NOT IN 
         (SELECT MAX(DuplicatesTableId) FROM dbo.DuplicatesTable GROUP BY ColA , ColB)

         Select * from DuplicatesTable ;  
    -- END ===  2.0.RemoveDuplicates.Script.sql

#1


12  

Please try the below query, it will definitely meet your objective

请尝试以下查询,它一定会符合您的目标

SET ROWCOUNT 1
DELETE test
FROM test a
WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1
WHILE @@rowcount > 0
  DELETE test
  FROM test a
  WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1
SET ROWCOUNT 0

where test is your table name

test是你的表名

#2


4  

This works in SQL Server although it isn't a single statement:

这在SQL Server中有效,尽管它不是单个语句:

Declare @cnt int; 
Select @cnt=COUNT(*) From DupTable Where (Col1=1);  -- Assumes you are trying to delete the duplicates where some condition (e.g. Col1=1) is true.
Delete Top (@cnt-1) From DupTable

It also doesn't require any extra assumptions (like the existance of another column that makes each row unique). After all, Santanu did say that the rows were duplicates and not just the one column.

它也不需要任何额外的假设(比如存在使每行唯一的另一列的存在)。毕竟,Santanu确实说这些行是重复的而不仅仅是一列。

However, the right answer, in my view, is to get a real table structure. That is, add an IDENTITY column to this table so that you can use a single SQL command to do your work. Like this:

但是,在我看来,正确的答案是获得一个真正的表结构。也就是说,在此表中添加IDENTITY列,以便您可以使用单个SQL命令来完成工作。喜欢这个:

ALTER TABLE dbo.DupTable ADD
    IDCol int NOT NULL IDENTITY (1, 1)
GO

Then the delete is trivial:

然后删除是微不足道的:

DELETE FROM DupTable WHERE IDCol NOT IN 
   (SELECT MAX(IDCol) FROM DupTable GROUP BY Col1, Col2, Col3)

#3


3  

DELETE FROM Table t1, Table t2 WHERE t1.colDup = t2.colDup AND t1.date < t2.date

Will delete every duplicate row from Table (on column colDup) except the oldest (i.e. lowset date).

将删除表中的每个重复行(在列colDup上),除了最旧的(即低设置日期)。

#4


2  

DELETE FROM `mytbl`
    INNER JOIN (
        SELECT 1 FROM `mytbl`
        GROUP BY `duplicated_column` HAVING COUNT(*)=2
    ) USING(`id`)

Edit:

编辑:

My bad, the above query won't work.

我的不好,上面的查询将无法正常工作。

Assuming table structure:

假设表结构:

id int auto_increment

id int auto_increment

num int # <-- this is the column with duplicated values

num int#< - 这是具有重复值的列

The following query would work in MySQL (i checked):

以下查询可以在MySQL中工作(我检查过):

DELETE `mytbl` FROM `mytbl` 
    INNER JOIN 
    (
        SELECT `num` FROM `mytbl`
        GROUP BY `num` HAVING COUNT(*)=2
    ) AS `tmp` USING (`num`)

The query would delete the rows that have 2 (not more or else) duplicated values in the num column.

该查询将删除num列中具有2(不多于或其他)重复值的行。

Edit (again):

编辑(再次):

I suggest to add a key on the num column.

我建议在num列上添加一个键。

Edit(#3):

编辑(#3):

In case that the author wanted to delete the duplicated rows, the following should work for MySQL (it worked for me):

如果作者想要删除重复的行,以下应该适用于MySQL(它适用于我):

DELETE `delete_duplicated_rows` FROM `delete_duplicated_rows`
    NATURAL JOIN (
        SELECT *
        FROM `delete_duplicated_rows`
        GROUP BY `num1` HAVING COUNT(*)=2
    ) AS `der`

While assuming table structure is:

假设表结构是:

CREATE TABLE `delete_duplicated_rows` (
  `num1` tinyint(4) DEFAULT NOT NULL,
  `num2` tinyint(4) DEFAULT NOT NULL
) ENGINE=MyISAM;

#5


1  

If you have the id's of the rows you want to delete then...

如果您要删除要删除的行的ID,则...

DELETE FROM table WHERE id IN (1, 4, 7, [id numbers to delete...])

#6


1  

I think each table has unique identifier. So if it exists then you can write following query: Delete Table1 from Table1 t1 where 2 >= (select count(id) from Table1 where dupColumn = t1.dupColumn) and t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)

我认为每个表都有唯一的标识符。因此,如果它存在,那么你可以写下面的查询:从Table1 t1删除Table1,其中2> =(从表1中选择count(id),其中dupColumn = t1.dupColumn)和t1.id不在(从表1中选择max(id) dupColumn = t1.dupColumn)

OOps. It seems it is possible to use second filter only Delete Table1 from Table1 t1 where t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)

哎呀。似乎可以仅使用第二个过滤器从Table1 t1中删除Table1,其中t1.id不在(从表1中选择max(id)其中dupColumn = t1.dupColumn)

#7


1  

  -- Just to demonstrates Marks example          
    . 
        -- START === 1.0.dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable] 
        Script Date: 03/29/2010 21:24:02 ******/
      IF EXISTS (SELECT * FROM sys.objects 
     WHERE 
object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') 
AND type in (N'U'))
        DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    
Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     ) 


    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ===  1.0.dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START === 1.1..dbo..DuplicatesTable.TableInsert.sql

    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([ColA], [ColB])
    SELECT   N'ColA', N'ColB' UNION ALL
    SELECT N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO


    -- END ===  1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START === 2.0.RemoveDuplicates.Script.sql
    ALTER TABLE dbo.DuplicatesTable ADD
            DuplicatesTableId int NOT NULL IDENTITY (1, 1)
    GO

    -- Then the delete is trivial:
    DELETE FROM dbo.DuplicatesTable WHERE DuplicatesTableId NOT IN 
         (SELECT MAX(DuplicatesTableId) FROM dbo.DuplicatesTable GROUP BY ColA , ColB)

         Select * from DuplicatesTable ;  
    -- END ===  2.0.RemoveDuplicates.Script.sql