删除除了MySQL中的行之外的所有重复行?(复制)

时间:2023-02-07 07:37:07

Possible Duplicate:
Remove duplicate rows in MySQL

可能的重复:删除MySQL中的重复行。

How would I delete all duplicate data from a MySQL Table?

如何从MySQL表中删除所有重复的数据?

For example, with the following data:

例如,下列数据:

SELECT * FROM names;

+----+--------+
| id | name   |
+----+--------+
| 1  | google |
| 2  | yahoo  |
| 3  | msn    |
| 4  | google |
| 5  | google |
| 6  | yahoo  |
+----+--------+

I would use SELECT DISTINCT name FROM names; if it were a SELECT query.

我会从名字中选择不同的名字;如果是SELECT查询。

How would I do this with DELETE to only remove duplicates and keep just one record of each?

我如何使用DELETE来删除重复项并只保留一个记录?

2 个解决方案

#1


833  

NB - You need to do this first on a test copy of your table!

NB -您需要首先在您的表的测试副本上执行此操作!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

当我做的时候,我发现除非我也包括n1。id < > n2。id,它删除了表中的每一行。

1) If you want to keep the row with the lowest id value:

1)如果要保持id值最小的行:

DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name

2) If you want to keep the row with the highest id value:

2)如果要保持id值最高的行:

DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name

I used this method in MySQL 5.1

我在MySQL 5.1中使用了这种方法

Not sure about other versions.

不确定其他版本。

Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

更新:虽然OP的问题是关于删除,但是由于人们在google上搜索删除重复,所以请注意使用INSERT和DISTINCT会更快。对于一个有800万行的数据库,下面的查询需要13分钟,而使用DELETE时,需要2个多小时,但还没有完成。

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
    SELECT DISTINCT cellId,attributeId,entityRowId,value
    FROM tableName;

#2


163  

If you want to keep the row with the lowest id value:

如果要保持id值最低的行:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MIN(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

If you want the id value that is the highest:

如果您想要的id值是最高的:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MAX(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.

子查询中的子查询对于MySQL是必要的,否则您将会得到1093错误。

#1


833  

NB - You need to do this first on a test copy of your table!

NB -您需要首先在您的表的测试副本上执行此操作!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

当我做的时候,我发现除非我也包括n1。id < > n2。id,它删除了表中的每一行。

1) If you want to keep the row with the lowest id value:

1)如果要保持id值最小的行:

DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name

2) If you want to keep the row with the highest id value:

2)如果要保持id值最高的行:

DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name

I used this method in MySQL 5.1

我在MySQL 5.1中使用了这种方法

Not sure about other versions.

不确定其他版本。

Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

更新:虽然OP的问题是关于删除,但是由于人们在google上搜索删除重复,所以请注意使用INSERT和DISTINCT会更快。对于一个有800万行的数据库,下面的查询需要13分钟,而使用DELETE时,需要2个多小时,但还没有完成。

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
    SELECT DISTINCT cellId,attributeId,entityRowId,value
    FROM tableName;

#2


163  

If you want to keep the row with the lowest id value:

如果要保持id值最低的行:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MIN(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

If you want the id value that is the highest:

如果您想要的id值是最高的:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MAX(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.

子查询中的子查询对于MySQL是必要的,否则您将会得到1093错误。