SQL join:选择一对多关系中的最后一个记录。

时间:2020-12-03 12:28:10

Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in one SELECT statement. What is the best practice? Any advice on building indexes?

假设我有一张顾客表和一张购物表。每次购买都属于一个客户。我想在一个SELECT语句中获取所有客户及其最后一次购买的列表。最佳实践是什么?对于构建索引有什么建议吗?

Please use these table/column names in your answer:

请在回答中使用以下表格/列名:

  • customer: id, name
  • 顾客:id、名称
  • purchase: id, customer_id, item_id, date
  • 采购:id, customer_id, item_id,日期

And in more complicated situations, would it be (performance-wise) beneficial to denormalize the database by putting the last purchase into the customer table?

在更复杂的情况下,通过将最后一次购买放到customer表中,是否可以(性能上)使数据库规范化?

If the (purchase) id is guaranteed to be sorted by date, can the statements be simplified by using something like LIMIT 1?

如果(购买)id保证按日期排序,那么可以使用LIMIT 1之类的东西来简化语句吗?

9 个解决方案

#1


318  

This is an example of the greatest-n-per-group problem that has appeared regularly on *.

这是*上经常出现的每个组最大问题的一个例子。

Here's how I usually recommend solving it:

下面是我通常推荐的解决方法:

SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND 
    (p1.date < p2.date OR p1.date = p2.date AND p1.id < p2.id))
WHERE p2.id IS NULL;

Explanation: given a row p1, there should be no row p2 with the same customer and a later date (or in the case of ties, a later id). When we find that to be true, then p1 is the most recent purchase for that customer.

说明:给定行p1,不应该有行p2与相同的客户和一个较晚的日期(如果是tie,则是一个较晚的id)。当我们发现这是正确的,那么p1是该客户最近的一次购买。

Regarding indexes, I'd create a compound index in purchase over the columns (customer_id, date, id). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN on MySQL.

关于索引,我将在购买列(customer_id、日期、id)中创建一个复合索引。这可能允许使用覆盖索引完成外部连接。一定要在您的平台上进行测试,因为优化是依赖于实现的。使用RDBMS的特性来分析优化计划。例如MySQL的解释。


Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.

有些人使用子查询代替我上面展示的解决方案,但我发现我的解决方案更容易解决关系。

#2


91  

You could also try doing this using a sub select

您也可以尝试使用子select来实现这一点

SELECT  c.*, p.*
FROM    customer c INNER JOIN
        (
            SELECT  customer_id,
                    MAX(date) MaxDate
            FROM    purchase
            GROUP BY customer_id
        ) MaxDates ON c.id = MaxDates.customer_id INNER JOIN
        purchase p ON   MaxDates.customer_id = p.customer_id
                    AND MaxDates.MaxDate = p.date

The select should join on all customers and their Last purchase date.

选择应该在所有客户及其最后购买日期上加入。

#3


21  

You haven't specified the database. If it is one that allows analytical functions it may be faster to use this approach than the GROUP BY one(definitely faster in Oracle, most likely faster in the late SQL Server editions, don't know about others).

您还没有指定数据库。如果它允许分析函数,那么使用这种方法的速度可能会比逐个组更快(在Oracle中肯定更快,在SQL Server后期版本中可能更快,不知道其他版本)。

Syntax in SQL Server would be:

SQL Server中的语法为:

SELECT c.*, p.*
FROM customer c INNER JOIN 
     (SELECT RANK() OVER (PARTITION BY customer_id ORDER BY date DESC) r, *
             FROM purchase) p
ON (c.id = p.customer_id)
WHERE p.r = 1

#4


14  

Another approach would be to use a NOT EXISTS condition in your join condition to test for later purchases:

另一种方法是使用join条件中的不存在条件来测试以后的购买:

SELECT *
FROM customer c
LEFT JOIN purchase p ON (
       c.id = p.customer_id
   AND NOT EXISTS (
     SELECT 1 FROM purchase p1
     WHERE p1.customer_id = c.id
     AND p1.id > p.id
   )
)

#5


6  

I found this thread as a solution to my problem.

我发现这条线可以解决我的问题。

But when I tried them the performance was low. Bellow is my suggestion for better performance.

但是当我试的时候,表现很差。贝娄是我建议的更好的表现。

With MaxDates as (
SELECT  customer_id,
                MAX(date) MaxDate
        FROM    purchase
        GROUP BY customer_id
)

SELECT  c.*, M.*
FROM    customer c INNER JOIN
        MaxDates as M ON c.id = M.customer_id 

Hope this will be helpful.

希望这能有所帮助。

#6


2  

Tested on SQLite:

SQLite:测试

SELECT c.*, p.*, max(p.date)
FROM customer c
LEFT OUTER JOIN purchase p
ON c.id = p.customer_id
GROUP BY c.id

The max() aggregate function will make sure that the latest purchase is selected from each group (but assumes that the date column is in a format whereby max() gives the latest - which is normally the case). If you want to handle purchases with the same date then you can use max(p.date, p.id).

max()聚合函数将确保从每个组中选择最新的购买(但假设日期列的格式是max()给出最新的——通常情况下是这样)。如果你想在同样的日期处理购买,你可以使用max(p)。目前为止,p.id)。

In terms of indexes, I would use an index on purchase with (customer_id, date, [any other purchase columns you want to return in your select]).

在索引方面,我将使用一个索引(customer_id, date,[任何其他您想要返回的购买列])。

The LEFT OUTER JOIN (as opposed to INNER JOIN) will make sure that customers that have never made a purchase are also included.

左外连接(与内连接相反)将确保没有购买的客户也包括在内。

#7


1  

Please try this,

请试试这个,

SELECT 
c.Id,
c.name,
(SELECT pi.price FROM purchase pi WHERE pi.Id = MAX(p.Id)) AS [LastPurchasePrice]
FROM customer c INNER JOIN purchase p 
ON c.Id = p.customerId 
GROUP BY c.Id,c.name;

#8


1  

If you're using PostgreSQL you can use DISTINCT ON to find the first row in a group.

如果使用PostgreSQL,可以使用DISTINCT ON查找组中的第一行。

SELECT customer.*, purchase.*
FROM customer
JOIN (
   SELECT DISTINCT ON (customer_id) *
   FROM purchase
   ORDER BY customer_id, date DESC
) purchase ON purchase.customer_id = customer.id

PostgreSQL Docs - Distinct On

PostgreSQL文档-与众不同

Note that the DISTINCT ON field(s) -- here customer_id -- must match the left most field(s) in the ORDER BY clause.

注意,不同的字段(s)——这里是customer_id——必须匹配ORDER BY子句中最左边的字段。

Caveat: This is a nonstandard clause.

注意:这是一个非标准条款。

#9


0  

Try this, It will help.

试试这个,会有帮助的。

I have used this in my project.

我在我的项目中使用了这个。

SELECT 
*
from
customer c
OUTER APPLY(SELECT top 1 * FROM purchase pi 
WHERE pi.Id = p.Id order by pi.Id desc) AS [LastPurchasePrice]

#1


318  

This is an example of the greatest-n-per-group problem that has appeared regularly on *.

这是*上经常出现的每个组最大问题的一个例子。

Here's how I usually recommend solving it:

下面是我通常推荐的解决方法:

SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND 
    (p1.date < p2.date OR p1.date = p2.date AND p1.id < p2.id))
WHERE p2.id IS NULL;

Explanation: given a row p1, there should be no row p2 with the same customer and a later date (or in the case of ties, a later id). When we find that to be true, then p1 is the most recent purchase for that customer.

说明:给定行p1,不应该有行p2与相同的客户和一个较晚的日期(如果是tie,则是一个较晚的id)。当我们发现这是正确的,那么p1是该客户最近的一次购买。

Regarding indexes, I'd create a compound index in purchase over the columns (customer_id, date, id). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN on MySQL.

关于索引,我将在购买列(customer_id、日期、id)中创建一个复合索引。这可能允许使用覆盖索引完成外部连接。一定要在您的平台上进行测试,因为优化是依赖于实现的。使用RDBMS的特性来分析优化计划。例如MySQL的解释。


Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.

有些人使用子查询代替我上面展示的解决方案,但我发现我的解决方案更容易解决关系。

#2


91  

You could also try doing this using a sub select

您也可以尝试使用子select来实现这一点

SELECT  c.*, p.*
FROM    customer c INNER JOIN
        (
            SELECT  customer_id,
                    MAX(date) MaxDate
            FROM    purchase
            GROUP BY customer_id
        ) MaxDates ON c.id = MaxDates.customer_id INNER JOIN
        purchase p ON   MaxDates.customer_id = p.customer_id
                    AND MaxDates.MaxDate = p.date

The select should join on all customers and their Last purchase date.

选择应该在所有客户及其最后购买日期上加入。

#3


21  

You haven't specified the database. If it is one that allows analytical functions it may be faster to use this approach than the GROUP BY one(definitely faster in Oracle, most likely faster in the late SQL Server editions, don't know about others).

您还没有指定数据库。如果它允许分析函数,那么使用这种方法的速度可能会比逐个组更快(在Oracle中肯定更快,在SQL Server后期版本中可能更快,不知道其他版本)。

Syntax in SQL Server would be:

SQL Server中的语法为:

SELECT c.*, p.*
FROM customer c INNER JOIN 
     (SELECT RANK() OVER (PARTITION BY customer_id ORDER BY date DESC) r, *
             FROM purchase) p
ON (c.id = p.customer_id)
WHERE p.r = 1

#4


14  

Another approach would be to use a NOT EXISTS condition in your join condition to test for later purchases:

另一种方法是使用join条件中的不存在条件来测试以后的购买:

SELECT *
FROM customer c
LEFT JOIN purchase p ON (
       c.id = p.customer_id
   AND NOT EXISTS (
     SELECT 1 FROM purchase p1
     WHERE p1.customer_id = c.id
     AND p1.id > p.id
   )
)

#5


6  

I found this thread as a solution to my problem.

我发现这条线可以解决我的问题。

But when I tried them the performance was low. Bellow is my suggestion for better performance.

但是当我试的时候,表现很差。贝娄是我建议的更好的表现。

With MaxDates as (
SELECT  customer_id,
                MAX(date) MaxDate
        FROM    purchase
        GROUP BY customer_id
)

SELECT  c.*, M.*
FROM    customer c INNER JOIN
        MaxDates as M ON c.id = M.customer_id 

Hope this will be helpful.

希望这能有所帮助。

#6


2  

Tested on SQLite:

SQLite:测试

SELECT c.*, p.*, max(p.date)
FROM customer c
LEFT OUTER JOIN purchase p
ON c.id = p.customer_id
GROUP BY c.id

The max() aggregate function will make sure that the latest purchase is selected from each group (but assumes that the date column is in a format whereby max() gives the latest - which is normally the case). If you want to handle purchases with the same date then you can use max(p.date, p.id).

max()聚合函数将确保从每个组中选择最新的购买(但假设日期列的格式是max()给出最新的——通常情况下是这样)。如果你想在同样的日期处理购买,你可以使用max(p)。目前为止,p.id)。

In terms of indexes, I would use an index on purchase with (customer_id, date, [any other purchase columns you want to return in your select]).

在索引方面,我将使用一个索引(customer_id, date,[任何其他您想要返回的购买列])。

The LEFT OUTER JOIN (as opposed to INNER JOIN) will make sure that customers that have never made a purchase are also included.

左外连接(与内连接相反)将确保没有购买的客户也包括在内。

#7


1  

Please try this,

请试试这个,

SELECT 
c.Id,
c.name,
(SELECT pi.price FROM purchase pi WHERE pi.Id = MAX(p.Id)) AS [LastPurchasePrice]
FROM customer c INNER JOIN purchase p 
ON c.Id = p.customerId 
GROUP BY c.Id,c.name;

#8


1  

If you're using PostgreSQL you can use DISTINCT ON to find the first row in a group.

如果使用PostgreSQL,可以使用DISTINCT ON查找组中的第一行。

SELECT customer.*, purchase.*
FROM customer
JOIN (
   SELECT DISTINCT ON (customer_id) *
   FROM purchase
   ORDER BY customer_id, date DESC
) purchase ON purchase.customer_id = customer.id

PostgreSQL Docs - Distinct On

PostgreSQL文档-与众不同

Note that the DISTINCT ON field(s) -- here customer_id -- must match the left most field(s) in the ORDER BY clause.

注意,不同的字段(s)——这里是customer_id——必须匹配ORDER BY子句中最左边的字段。

Caveat: This is a nonstandard clause.

注意:这是一个非标准条款。

#9


0  

Try this, It will help.

试试这个,会有帮助的。

I have used this in my project.

我在我的项目中使用了这个。

SELECT 
*
from
customer c
OUTER APPLY(SELECT top 1 * FROM purchase pi 
WHERE pi.Id = p.Id order by pi.Id desc) AS [LastPurchasePrice]