查询一个列的副本,并返回原始的和重复的行。

时间:2022-09-23 12:10:52

I have a table that I use to store some systematically chosen "serial numbers" for each product that is bought...

我有一个表格,用来存储一些系统选择的“序列号”,用于购买每一种产品。

The problem is, a CSV was uploaded that I believe contained some duplicate "serial numbers", which means that when the application tries to modify a row, it may not be modifying the correct one.

问题是,一个CSV被上传,我认为它包含了一些重复的“序列号”,这意味着当应用程序试图修改一行时,它可能不会修改正确的行。

I need to be able to query the database and get all rows that are a double of the serial_number column. It should look something like this:

我需要能够查询数据库并获取所有列的所有行,这是serial_number列的两倍。它应该是这样的:

ID, serial_number, meta1, meta2, meta3
3, 123456, 0, 2, 4
55, 123456, 0, 0, 0
6, 345678, 0, 1, 2
99, 345678, 0, 1, 2

So as you can see, I need to be able to see both the original row and the duplicate row and all of it's columns of data ... this is so I can compare them and determine what data is now inconsistent.

正如你所看到的,我需要能够看到原始行和复制行以及所有的数据列…这样我就可以比较它们,确定哪些数据现在是不一致的。

2 个解决方案

#1


1  

Some versions of MySQL implement in with a subquery very inefficiently. A safe alternative is a join:

一些版本的MySQL在子查询中非常低效。一个安全的选择是加入:

SELECT t.*
FROM t join
     (select serial_number, count(*) as cnt
      from t
      group by serial_number
     ) tsum
     on tsum.serial_number = t.serial_number and cnt > 1
order by t.serial_number;

Another alternative is to use an exists clause:

另一种选择是使用现有的条款:

select t.*
from t
where exists (select * from t t2 where t2.serial_number = t.serial_number and t2.id <> t.id)
order by t.serial_number;

Both these queries (as well as the one proposed by @fthiella) are standard SQL. Both would benefit from an index on (serial_number, id).

这两个查询(以及@fthiella提出的查询)都是标准的SQL。它们都将从索引(serial_number, id)中获益。

#2


1  

SELECT *
FROM
  yourtable
WHERE
  serial_number IN (SELECT serial_number
                    FROM yourtable
                    GROUP BY serial_number
                    HAVING COUNT(*)>1)
ORDER BY
  serial_number, id

#1


1  

Some versions of MySQL implement in with a subquery very inefficiently. A safe alternative is a join:

一些版本的MySQL在子查询中非常低效。一个安全的选择是加入:

SELECT t.*
FROM t join
     (select serial_number, count(*) as cnt
      from t
      group by serial_number
     ) tsum
     on tsum.serial_number = t.serial_number and cnt > 1
order by t.serial_number;

Another alternative is to use an exists clause:

另一种选择是使用现有的条款:

select t.*
from t
where exists (select * from t t2 where t2.serial_number = t.serial_number and t2.id <> t.id)
order by t.serial_number;

Both these queries (as well as the one proposed by @fthiella) are standard SQL. Both would benefit from an index on (serial_number, id).

这两个查询(以及@fthiella提出的查询)都是标准的SQL。它们都将从索引(serial_number, id)中获益。

#2


1  

SELECT *
FROM
  yourtable
WHERE
  serial_number IN (SELECT serial_number
                    FROM yourtable
                    GROUP BY serial_number
                    HAVING COUNT(*)>1)
ORDER BY
  serial_number, id