使用两个列的唯一组合查找所有记录

时间:2021-01-19 04:27:04

I have this table messages

我有这个表格信息

sender_id    recipient_id
1            2
1            3
1            3
2            1
3            1
2            3

I wish to select records such that:

我希望选择记录,以便:

  1. Either sender_id or receiver_id = current_user.id
  2. sender_id或receiver_id = current_user.id
  3. The other field should be unique
  4. 另一个字段应该是唯一的

i.e I want to select unique from table where sender_id =2 || recipient_id =2 and i need this result i don't know how to do it sender_id recipient_id 2 1 2 3

我。e我想要从表中选择unique,其中sender_id =2 ||收信人id =2我需要这个结果,我不知道怎么做sender_id收信人id 2 2 2 2 2 2 3

Why? Because I wish to build a facebook-like inbox in which sent and received messages are aggregated, and this query is the bottleneck so far.

为什么?因为我希望构建一个类似facebook的收件箱,将发送和接收的消息聚合在其中,而这个查询是目前为止的瓶颈。

I am using rails 3.2 and postgres 9.3

我使用的是rails 3.2和postgres 9.3

2 个解决方案

#1


4  

SELECT DISTINCT sender_id, recipient_id
FROM   messages
WHERE  $current_user_id IN (sender_id, receiver_id)
AND    sender_id <= receiver_id;

The last condition is necessary to fold (2,1) and (1,2) into a single row.

最后一个条件是将(2,1)和(1,2)折叠成一行。

If, contrary to your example, there can be (2,1) in your data without (1,2) also being there, it gets more complicated. A UNION query should be perfect:

如果与示例相反,数据中可能有(2,1)而没有(1,2),情况就会变得更复杂。一个联合查询应该是完美的:

WITH cte AS (
   SELECT sender_id, recipient_id
   FROM   messages
   WHERE  $current_user_id IN (sender_id, receiver_id)
   )
SELECT sender_id, recipient_id FROM cte
WHERE  sender_id <= receiver_id
UNION
SELECT recipient_id, sender_id FROM cte
WHERE  sender_id >  receiver_id;

The CTE should be faster by keeping it down to a single index scan instead of two.

CTE应该通过将它保持在一个索引扫描而不是两个索引扫描来提高速度。

UNION removes duplicates from the result making the DISTINCT step unnecessary.
You might want to add an ORDER BY clause at the end for sorted output.

UNION从结果中删除重复的内容,这样就不需要单独的步骤。您可能希望在排序输出的末尾添加ORDER BY子句。

You need an index on both columns for best performance. While individual indexes can be combined with a bitmax index scan with good performance, a multicolumn index on (sender_id, receiver_id) will still be faster:

为了获得最佳性能,需要在两列上都有索引。虽然单个索引可以与性能良好的bitmax索引扫描结合使用,但是(sender_id, receiver_id)上的多色索引仍然会更快:

CREATE INDEX foo_idx ON messages (sender_id, recipient_id);

As always, weigh cost and benefits for an index. If the query is a bottleneck, the index is probably a good idea.

一如既往,衡量一个指数的成本和收益。如果查询是瓶颈,那么索引可能是一个好主意。

#2


2  

With ANSI SQL:

ANSI SQL:

SELECT DISTINCT sender_id, reciepient_id
FROM messages
WHERE (sender_id = current_user.id or reciepient_id = current_user.id)

#1


4  

SELECT DISTINCT sender_id, recipient_id
FROM   messages
WHERE  $current_user_id IN (sender_id, receiver_id)
AND    sender_id <= receiver_id;

The last condition is necessary to fold (2,1) and (1,2) into a single row.

最后一个条件是将(2,1)和(1,2)折叠成一行。

If, contrary to your example, there can be (2,1) in your data without (1,2) also being there, it gets more complicated. A UNION query should be perfect:

如果与示例相反,数据中可能有(2,1)而没有(1,2),情况就会变得更复杂。一个联合查询应该是完美的:

WITH cte AS (
   SELECT sender_id, recipient_id
   FROM   messages
   WHERE  $current_user_id IN (sender_id, receiver_id)
   )
SELECT sender_id, recipient_id FROM cte
WHERE  sender_id <= receiver_id
UNION
SELECT recipient_id, sender_id FROM cte
WHERE  sender_id >  receiver_id;

The CTE should be faster by keeping it down to a single index scan instead of two.

CTE应该通过将它保持在一个索引扫描而不是两个索引扫描来提高速度。

UNION removes duplicates from the result making the DISTINCT step unnecessary.
You might want to add an ORDER BY clause at the end for sorted output.

UNION从结果中删除重复的内容,这样就不需要单独的步骤。您可能希望在排序输出的末尾添加ORDER BY子句。

You need an index on both columns for best performance. While individual indexes can be combined with a bitmax index scan with good performance, a multicolumn index on (sender_id, receiver_id) will still be faster:

为了获得最佳性能,需要在两列上都有索引。虽然单个索引可以与性能良好的bitmax索引扫描结合使用,但是(sender_id, receiver_id)上的多色索引仍然会更快:

CREATE INDEX foo_idx ON messages (sender_id, recipient_id);

As always, weigh cost and benefits for an index. If the query is a bottleneck, the index is probably a good idea.

一如既往,衡量一个指数的成本和收益。如果查询是瓶颈,那么索引可能是一个好主意。

#2


2  

With ANSI SQL:

ANSI SQL:

SELECT DISTINCT sender_id, reciepient_id
FROM messages
WHERE (sender_id = current_user.id or reciepient_id = current_user.id)