将SQL表与自身进行比较(自联接)

时间:2022-02-11 14:50:17

I'm trying to find duplicate rows based on mixed columns. This is an example of what I have:

我正在尝试根据混合列找到重复的行。这是我的一个例子:

CREATE TABLE Test
(
   id INT PRIMARY KEY,
   test1 varchar(124),
   test2 varchar(124)
)

INSERT INTO TEST ( id, test1, test2 ) VALUES ( 1, 'A', 'B' )
INSERT INTO TEST ( id, test1, test2 ) VALUES ( 2, 'B', 'C' )

Now if I run this query:

现在,如果我运行此查询:

SELECT [LEFT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]

I would expect to get back both id's. (1 and 2), however I only ever get back the one row.

我希望能找到两个id。 (1和2),但我只回到了一排。

My thoughts would be that it should compare each row, but I guess this is not correct? To fix this I had changed my query to be:

我的想法是它应该比较每一行,但我想这不正确?为了解决这个问题,我将查询更改为:

SELECT [LEFT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2] 
OR [LEFT].[TEST2] = [RIGHT].[TEST1]

Which gives me both rows, but the performance degrades extremely quickly based on the number of rows.

这给了我两行,但性能根据行数极快地降低。

The final solution I came up for for performance and results was to use a union:

我为性能和结果找到的最终解决方案是使用联合:

SELECT [LEFT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2] 
UNION
SELECT [LEFT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST2] = [RIGHT].[TEST1]

But overall, I'm obviously missing an understanding of why this is not working which means that I'm probably doing something wrong. Could someone point me in the proper direction?

但总的来说,我显然错过了对为什么这不起作用的理解,这意味着我可能做错了什么。有人能指出我正确的方向吗?

4 个解决方案

#1


10  

Do not JOIN on an inequality; it seems that the JOIN and WHERE conditions are inverted.

不要加入不平等;似乎JOIN和WHERE条件被反转。

SELECT t1.id
FROM Test t1
INNER JOIN Test t2
ON ((t1.test1 = t2.test2) OR (t1.test2 = t2.test1))
WHERE t1.id <> t2.id

Should work fine.

应该工作正常。

#2


5  

You only get back both id's if you select them:

如果您选择它们​​,您只能取回两个ID:

SELECT [LEFT].[ID], [RIGHT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]

The reason that only get one ROW is that only one row (namely row #2) has a TEST1 that is equal to another row's TEST2.

只获得一个ROW的原因是只有一行(即第2行)的TEST1等于另一行的TEST2。

#3


2  

I looks like you're working very quickly toward a Cartiesian Join. Normally if you're looking to return duplicates, you need to run something like:

我看起来你很快就开始了Cartiesian加入。通常,如果您要返回重复项,则需要执行以下操作:

SELECT [LEFT].*
FROM [TEST]  AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
    ON [LEFT].[test1] = [RIGHT].[test1]
        AND [LEFT].[test2] = [RIGHT].[test2]
        AND [LEFT].[id] <> [RIGHT].[id]

If you need to mix the columns, then mix the needed conditions, but do something like:

如果您需要混合列,然后混合所需的条件,但执行以下操作:

SELECT [LEFT].*
FROM [TEST] AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
    ON (
        [LEFT].[test1] = [RIGHT].[test2]
            OR [LEFT].[test2] = [RIGHT].[test1]
       )
        AND [LEFT].[id] <> [RIGHT].[id]

Using that, you compare the right to the left and the left to the right in each join, eliminating the need for the WHERE altogether.

使用它,您可以在每个连接中比较左侧和右侧的右侧,完全不需要WHERE。

However, this style of query grows exponentially in execution time for each row inserted into the table, since you're comparing each row to every row.

但是,这种查询样式在插入表中的每一行的执行时间中呈指数级增长,因为您要将每行与每行进行比较。

#4


0  

This can be done with out inner joins if I am not mistaken. This my first time answering mysql kind of question but I am just answering to get more points here on *. The comma is very important so that mysql does not complain.

如果我没有弄错的话,这可以通过内连接来完成。这是我第一次回答mysql的问题,但我只是回答在*上获得更多积分。逗号是非常重要的,以便mysql不会抱怨。

SELECT [LEFT].[ID] FROM [TEST] AS [LEFT], [TEST] AS [RIGHT] 
WHERE [LEFT].[ID] != [RIGHT].[ID] 
AND [LEFT].[TEST1] = [RIGHT].[TEST2];

#1


10  

Do not JOIN on an inequality; it seems that the JOIN and WHERE conditions are inverted.

不要加入不平等;似乎JOIN和WHERE条件被反转。

SELECT t1.id
FROM Test t1
INNER JOIN Test t2
ON ((t1.test1 = t2.test2) OR (t1.test2 = t2.test1))
WHERE t1.id <> t2.id

Should work fine.

应该工作正常。

#2


5  

You only get back both id's if you select them:

如果您选择它们​​,您只能取回两个ID:

SELECT [LEFT].[ID], [RIGHT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]

The reason that only get one ROW is that only one row (namely row #2) has a TEST1 that is equal to another row's TEST2.

只获得一个ROW的原因是只有一行(即第2行)的TEST1等于另一行的TEST2。

#3


2  

I looks like you're working very quickly toward a Cartiesian Join. Normally if you're looking to return duplicates, you need to run something like:

我看起来你很快就开始了Cartiesian加入。通常,如果您要返回重复项,则需要执行以下操作:

SELECT [LEFT].*
FROM [TEST]  AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
    ON [LEFT].[test1] = [RIGHT].[test1]
        AND [LEFT].[test2] = [RIGHT].[test2]
        AND [LEFT].[id] <> [RIGHT].[id]

If you need to mix the columns, then mix the needed conditions, but do something like:

如果您需要混合列,然后混合所需的条件,但执行以下操作:

SELECT [LEFT].*
FROM [TEST] AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
    ON (
        [LEFT].[test1] = [RIGHT].[test2]
            OR [LEFT].[test2] = [RIGHT].[test1]
       )
        AND [LEFT].[id] <> [RIGHT].[id]

Using that, you compare the right to the left and the left to the right in each join, eliminating the need for the WHERE altogether.

使用它,您可以在每个连接中比较左侧和右侧的右侧,完全不需要WHERE。

However, this style of query grows exponentially in execution time for each row inserted into the table, since you're comparing each row to every row.

但是,这种查询样式在插入表中的每一行的执行时间中呈指数级增长,因为您要将每行与每行进行比较。

#4


0  

This can be done with out inner joins if I am not mistaken. This my first time answering mysql kind of question but I am just answering to get more points here on *. The comma is very important so that mysql does not complain.

如果我没有弄错的话,这可以通过内连接来完成。这是我第一次回答mysql的问题,但我只是回答在*上获得更多积分。逗号是非常重要的,以便mysql不会抱怨。

SELECT [LEFT].[ID] FROM [TEST] AS [LEFT], [TEST] AS [RIGHT] 
WHERE [LEFT].[ID] != [RIGHT].[ID] 
AND [LEFT].[TEST1] = [RIGHT].[TEST2];