如何基于多个字段删除SQL表中的重复项

时间:2022-08-27 04:19:02

I have a table of games, which is described as follows:

我有一个游戏表,描述如下:

+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| date          | date        | NO   |     | NULL    |                |
| time          | time        | NO   |     | NULL    |                |
| hometeam_id   | int(11)     | NO   | MUL | NULL    |                |
| awayteam_id   | int(11)     | NO   | MUL | NULL    |                |
| locationcity  | varchar(30) | NO   |     | NULL    |                |
| locationstate | varchar(20) | NO   |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+

But each game has a duplicate entry in the table somewhere, because each game was in the schedules for two teams. Is there a sql statement I can use to look through and delete all the duplicates based on identical date, time, hometeam_id, awayteam_id, locationcity, and locationstate fields?

但是每个游戏在某个地方的表格中都有重复的条目,因为每个游戏都在两个团队的时间表中。是否有一个sql语句我可以用来查看和删除所有重复项基于相同的日期,时间,hometeam_id,awayteam_id,locationcity和locationstate字段?

8 个解决方案

#1


40  

You should be able to do a correlated subquery to delete the data. Find all rows that are duplicates and delete all but the one with the smallest id. For MYSQL, an inner join (functional equivalent of EXISTS) needs to be used, like so:

您应该能够执行相关子查询来删除数据。查找所有重复的行并删除除id之外的所有行。对于MYSQL,需要使用内连接(功能等同于EXISTS),如下所示:

delete games from games inner join 
    (select  min(id) minid, date, time,
             hometeam_id, awayteam_id, locationcity, locationstate
     from games 
     group by date, time, hometeam_id, 
              awayteam_id, locationcity, locationstate
     having count(1) > 1) as duplicates
   on (duplicates.date = games.date
   and duplicates.time = games.time
   and duplicates.hometeam_id = games.hometeam_id
   and duplicates.awayteam_id = games.awayteam_id
   and duplicates.locationcity = games.locationcity
   and duplicates.locationstate = games.locationstate
   and duplicates.minid <> games.id)

To test, replace delete games from games with select * from games. Don't just run a delete on your DB :-)

要测试,请使用select * from games替换游戏中的删除游戏。不要只在你的数据库上运行删除:-)

#2


12  

You can try such query:

你可以尝试这样的查询:

DELETE FROM table_name AS t1
WHERE EXISTS (
 SELECT 1 FROM table_name AS t2 
 WHERE t2.date = t1.date 
 AND t2.time = t1.time 
 AND t2.hometeam_id = t1.hometeam_id 
 AND t2.awayteam_id = t1.awayteam_id 
 AND t2.locationcity = t1.locationcity 
 AND t2.id > t1.id )

This will leave in database only one example of each game instance which has the smallest id.

这将在数​​据库中仅留下具有最​​小id的每个游戏实例的一个示例。

#3


7  

The best thing that worked for me was to recreate the table.

对我有用的最好的事情是重新创建表格。

CREATE TABLE newtable SELECT * FROM oldtable GROUP BY field1,field2;

You can then rename.

然后,您可以重命名。

#4


5  

To get list of duplicate entried matching two fields

获取重复的列表匹配两个字段

select t.ID, t.field1, t.field2
from (
  select field1, field2
  from table_name
  group by field1, field2
  having count(*) > 1) x, table_name t
where x.field1 = t.field1 and x.field2 = t.field2
order by t.field1, t.field2

And to delete all the duplicate only

并删除所有重复

DELETE x 
FROM table_name x
JOIN table_name y
ON y.field1= x.field1
AND y.field2 = x.field2
AND y.id < x.id;

#5


4  

select orig.id,
       dupl.id
from   games   orig, 
       games   dupl
where  orig.date   =    dupl.date
and    orig.time   =    dupl.time
and    orig.hometeam_id = dupl.hometeam_id
and    orig. awayteam_id = dupl.awayeam_id
and    orig.locationcity = dupl.locationcity
and    orig.locationstate = dupl.locationstate
and    orig.id     <    dupl.id

this should give you the duplicates; you can use it as a subquery to specify IDs to delete.

这应该给你重复;您可以将其用作子查询来指定要删除的ID。

#6


2  

AS long as you are not getting id (primary key) of the table in your select query and the other data is exact same you can use SELECT DISTINCT to avoid getting duplicate results.

只要您没有在选择查询中获取表的id(主键),而其他数据完全相同,您可以使用SELECT DISTINCT来避免重复结果。

#7


2  

delete from games 
   where id not in 
   (select max(id)  from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate 
    );

Workaround

解决方法

select max(id)  id from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
into table temp_table;

delete from games where id in (select id from temp);

#8


1  

DELETE FROM table
WHERE id = 
    (SELECT t.id
    FROM table as t
    JOIN (table as tj ON (t.date = tj.data
                          AND t.hometeam_id = tj.hometeam_id
                          AND t.awayteam_id = tj.awayteam_id
                          ...))

#1


40  

You should be able to do a correlated subquery to delete the data. Find all rows that are duplicates and delete all but the one with the smallest id. For MYSQL, an inner join (functional equivalent of EXISTS) needs to be used, like so:

您应该能够执行相关子查询来删除数据。查找所有重复的行并删除除id之外的所有行。对于MYSQL,需要使用内连接(功能等同于EXISTS),如下所示:

delete games from games inner join 
    (select  min(id) minid, date, time,
             hometeam_id, awayteam_id, locationcity, locationstate
     from games 
     group by date, time, hometeam_id, 
              awayteam_id, locationcity, locationstate
     having count(1) > 1) as duplicates
   on (duplicates.date = games.date
   and duplicates.time = games.time
   and duplicates.hometeam_id = games.hometeam_id
   and duplicates.awayteam_id = games.awayteam_id
   and duplicates.locationcity = games.locationcity
   and duplicates.locationstate = games.locationstate
   and duplicates.minid <> games.id)

To test, replace delete games from games with select * from games. Don't just run a delete on your DB :-)

要测试,请使用select * from games替换游戏中的删除游戏。不要只在你的数据库上运行删除:-)

#2


12  

You can try such query:

你可以尝试这样的查询:

DELETE FROM table_name AS t1
WHERE EXISTS (
 SELECT 1 FROM table_name AS t2 
 WHERE t2.date = t1.date 
 AND t2.time = t1.time 
 AND t2.hometeam_id = t1.hometeam_id 
 AND t2.awayteam_id = t1.awayteam_id 
 AND t2.locationcity = t1.locationcity 
 AND t2.id > t1.id )

This will leave in database only one example of each game instance which has the smallest id.

这将在数​​据库中仅留下具有最​​小id的每个游戏实例的一个示例。

#3


7  

The best thing that worked for me was to recreate the table.

对我有用的最好的事情是重新创建表格。

CREATE TABLE newtable SELECT * FROM oldtable GROUP BY field1,field2;

You can then rename.

然后,您可以重命名。

#4


5  

To get list of duplicate entried matching two fields

获取重复的列表匹配两个字段

select t.ID, t.field1, t.field2
from (
  select field1, field2
  from table_name
  group by field1, field2
  having count(*) > 1) x, table_name t
where x.field1 = t.field1 and x.field2 = t.field2
order by t.field1, t.field2

And to delete all the duplicate only

并删除所有重复

DELETE x 
FROM table_name x
JOIN table_name y
ON y.field1= x.field1
AND y.field2 = x.field2
AND y.id < x.id;

#5


4  

select orig.id,
       dupl.id
from   games   orig, 
       games   dupl
where  orig.date   =    dupl.date
and    orig.time   =    dupl.time
and    orig.hometeam_id = dupl.hometeam_id
and    orig. awayteam_id = dupl.awayeam_id
and    orig.locationcity = dupl.locationcity
and    orig.locationstate = dupl.locationstate
and    orig.id     <    dupl.id

this should give you the duplicates; you can use it as a subquery to specify IDs to delete.

这应该给你重复;您可以将其用作子查询来指定要删除的ID。

#6


2  

AS long as you are not getting id (primary key) of the table in your select query and the other data is exact same you can use SELECT DISTINCT to avoid getting duplicate results.

只要您没有在选择查询中获取表的id(主键),而其他数据完全相同,您可以使用SELECT DISTINCT来避免重复结果。

#7


2  

delete from games 
   where id not in 
   (select max(id)  from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate 
    );

Workaround

解决方法

select max(id)  id from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
into table temp_table;

delete from games where id in (select id from temp);

#8


1  

DELETE FROM table
WHERE id = 
    (SELECT t.id
    FROM table as t
    JOIN (table as tj ON (t.date = tj.data
                          AND t.hometeam_id = tj.hometeam_id
                          AND t.awayteam_id = tj.awayteam_id
                          ...))