在MySQL中使用GROUP BY选择最近的行

时间:2022-06-02 12:26:53

I'm trying to select each user with their most recent payment. The query I have now selects the users first payment. I.e. if a user has made two payments and the payment.ids are 10 and 11, the query selects the user with the info for payment id 10, not 11.

我试着用最近的付款来选择每个用户。我现在的查询选择用户的第一次付款。也就是说,如果用户已经支付了两笔款项并支付了这笔款项。id分别为10和11,查询选择的用户为支付id 10,而不是11。

  SELECT users.*, payments.method, payments.id AS payment_id 
    FROM `users` 
         LEFT JOIN `payments` ON users.id = payments.user_id 
GROUP BY users.id

I've added ORDER BY payments.id, but the query seems to ignore it and still selects the first payment.

我已按付款方式增加了订单。id,但是查询似乎忽略了它,仍然选择了第一次付款。

All help appreciated. Thanks.

感激所有帮助。谢谢。

6 个解决方案

#1


16  

You want the groupwise maximum; in essence, group the payments table to identify the maximal records, then join the result back with itself to fetch the other columns:

你想要集体最大;本质上,将支付表分组以识别最大记录,然后将结果与自身结合,获取其他列:

SELECT users.*, payments.method, payments.id AS payment_id
FROM   payments NATURAL JOIN (
  SELECT   user_id, MAX(id) AS id 
  FROM     payments
  GROUP BY user_id
) t RIGHT JOIN users ON users.id = t.user_id

Note that MAX(id) may not be the "most recent payment", depending on your application and schema: it's usually better to determine "most recent" based off TIMESTAMP than based off synthetic identifiers such as an AUTO_INCREMENT primary key column.

请注意,MAX(id)可能不是“最近的支付”,这取决于您的应用程序和模式:通常最好确定“最近的”基于时间戳的,而不是基于诸如AUTO_INCREMENT主键列这样的合成标识符。

#2


2  

I've just been dealing with pretty much exactly the same problem and found these answers helpful. My testing seems to suggest you can make it slightly simpler than the accepted answer, viz.:

我一直在处理几乎完全相同的问题,发现这些答案很有帮助。我的测试似乎表明你可以使它比公认的答案稍微简单一点。

SELECT u.*, p.method, p.id AS payment_id 
FROM `users` u, `payments` p
WHERE u.id = p.user_id 
    AND p.id = (SELECT MAX(p2.id) FROM payments p2
                    WHERE p2.user_id = u.id);

I've not performance tested the differences but the db I'm working on has over 50,000 Users and over 60,000 payments and the query runs in 0.024 seconds.

我没有测试这些差异,但是我正在开发的数据库有超过50000用户,超过6万的支付,查询运行时间是0.024秒。

#3


1  

My solution:

我的解决方案:

SELECT

u.codigo, 
u.nome,  
max(r.latitude),  
max(r.longitude),  
max(r.data_criacao) 

from TAB_REGISTRO_COORDENADAS  r

inner join TAB_USUARIO u

on u.codigo = r.cd_usuario

group by u.codigo

#4


0  

Taking this one step further, we can also use:

更进一步,我们也可以使用:

select payment_id, cust_id, amount, payment_method 
from my_table where payment_id in 
(
    select max(payment_id) from my_table group by cust_id
);

...but this query is also taking way too long in my context. The inner select is smoking fast, but the outer takes a while, and with only 124 results from the inner. Ideas?

…但是这个查询在我的背景下也太长了。内部选择是快速吸烟,但外部需要一段时间,并且只有124个结果来自内部。想法吗?

#5


0  

I have come across this before. Group by's are more intended for aggregate expressions or identical records. My research found it is best practice to do something like this:

我以前遇到过这种情况。Group by's更适合用于聚合表达式或相同的记录。我的研究发现,做这样的事情是最好的做法:

    SELECT  u.*, p.method, p.id AS payment_id
    FROM    (
        SELECT  DISTINCT users.id
        FROM    users
        ) ur
    JOIN    payments p
    ON      p.id =
        (
        SELECT  pt.id
        FROM    payments pt
        WHERE   pt.user_id = ur.id
        ORDER BY
                pt.id DESC
        LIMIT 1
        )

#6


0  

I read the following solution on SO long ago, but I can't find the link to credit, but here goes:

很久以前,我读过以下的解决方案,但是我找不到与信贷的联系,但我发现:

SELECT users.*, payments.method, payments.id AS payment_id, payments2.id
FROM users
JOIN payments
    ON users.id = payments.user_id 
LEFT JOIN payments2
    ON payments.user_id = payments2.user_id
    AND payments.id < payments2.id
WHERE payments2.id IS NULL

To understand how this works, just drop the WHERE payments2.id IS NULL and you'll see what is happening, for instance it could produce the following output (I haven't build the schema to test this, so it's pseudo-output). Assume there are the following records in payments:

要理解这是如何工作的,只需删除WHERE payments2。id为NULL,您将看到正在发生的情况,例如它可能产生以下输出(我还没有构建用于测试这个的模式,所以它是伪输出)。假设有以下付款记录:

id | user_id | method
1  | 1       | VISA
2  | 1       | VISA
3  | 1       | VISA
4  | 1       | VISA

And the above SQL (without the WHERE payments2.id IS NULL clause) should produce:

和上面的SQL(没有WHERE payments2)。id为空子句)应产生:

users.id | payments.method | payments.id | payments2.id
1        | VISA            | 1           | 2
1        | VISA            | 1           | 3
1        | VISA            | 1           | 4
1        | VISA            | 2           | 3
1        | VISA            | 2           | 4
1        | VISA            | 3           | 4
1        | VISA            | 4           | NULL

As you can see the the last line produces the desired result, and since there's no payments2.id > 4, the LEFT JOIN results in a payments2.id = NULL.

正如您所看到的,最后一行产生了所需的结果,因为没有payments2。id > 4,左侧连接结果为payments2。id = NULL。

I've found this solution to be much faster (from my early tests) than the accepted answer.

我发现这个解决方案比公认的答案要快得多(从我早期的测试来看)。

Using a different schema but a similar query, of 16095 records:

使用不同的模式,但使用相似的查询,记录16095:

select as1.*, as2.id
from allocation_status as1
left join allocation_status as2 
    on as1.allocation_id = as2.allocation_id
    and as1.id < as2.id
where as2.id is null;

16095 rows affected, taking 4.1ms

Compared to the accepted answer of MAX / subquery:

与MAX / subquery的公认答案相比:

SELECT as1.* 
FROM allocation_status as1
JOIN (
    SELECT max(id) as id
    FROM allocation_status
    group by allocation_id
) as_max on as1.id = as_max.id 

16095 rows affected, taking 14.8ms

#1


16  

You want the groupwise maximum; in essence, group the payments table to identify the maximal records, then join the result back with itself to fetch the other columns:

你想要集体最大;本质上,将支付表分组以识别最大记录,然后将结果与自身结合,获取其他列:

SELECT users.*, payments.method, payments.id AS payment_id
FROM   payments NATURAL JOIN (
  SELECT   user_id, MAX(id) AS id 
  FROM     payments
  GROUP BY user_id
) t RIGHT JOIN users ON users.id = t.user_id

Note that MAX(id) may not be the "most recent payment", depending on your application and schema: it's usually better to determine "most recent" based off TIMESTAMP than based off synthetic identifiers such as an AUTO_INCREMENT primary key column.

请注意,MAX(id)可能不是“最近的支付”,这取决于您的应用程序和模式:通常最好确定“最近的”基于时间戳的,而不是基于诸如AUTO_INCREMENT主键列这样的合成标识符。

#2


2  

I've just been dealing with pretty much exactly the same problem and found these answers helpful. My testing seems to suggest you can make it slightly simpler than the accepted answer, viz.:

我一直在处理几乎完全相同的问题,发现这些答案很有帮助。我的测试似乎表明你可以使它比公认的答案稍微简单一点。

SELECT u.*, p.method, p.id AS payment_id 
FROM `users` u, `payments` p
WHERE u.id = p.user_id 
    AND p.id = (SELECT MAX(p2.id) FROM payments p2
                    WHERE p2.user_id = u.id);

I've not performance tested the differences but the db I'm working on has over 50,000 Users and over 60,000 payments and the query runs in 0.024 seconds.

我没有测试这些差异,但是我正在开发的数据库有超过50000用户,超过6万的支付,查询运行时间是0.024秒。

#3


1  

My solution:

我的解决方案:

SELECT

u.codigo, 
u.nome,  
max(r.latitude),  
max(r.longitude),  
max(r.data_criacao) 

from TAB_REGISTRO_COORDENADAS  r

inner join TAB_USUARIO u

on u.codigo = r.cd_usuario

group by u.codigo

#4


0  

Taking this one step further, we can also use:

更进一步,我们也可以使用:

select payment_id, cust_id, amount, payment_method 
from my_table where payment_id in 
(
    select max(payment_id) from my_table group by cust_id
);

...but this query is also taking way too long in my context. The inner select is smoking fast, but the outer takes a while, and with only 124 results from the inner. Ideas?

…但是这个查询在我的背景下也太长了。内部选择是快速吸烟,但外部需要一段时间,并且只有124个结果来自内部。想法吗?

#5


0  

I have come across this before. Group by's are more intended for aggregate expressions or identical records. My research found it is best practice to do something like this:

我以前遇到过这种情况。Group by's更适合用于聚合表达式或相同的记录。我的研究发现,做这样的事情是最好的做法:

    SELECT  u.*, p.method, p.id AS payment_id
    FROM    (
        SELECT  DISTINCT users.id
        FROM    users
        ) ur
    JOIN    payments p
    ON      p.id =
        (
        SELECT  pt.id
        FROM    payments pt
        WHERE   pt.user_id = ur.id
        ORDER BY
                pt.id DESC
        LIMIT 1
        )

#6


0  

I read the following solution on SO long ago, but I can't find the link to credit, but here goes:

很久以前,我读过以下的解决方案,但是我找不到与信贷的联系,但我发现:

SELECT users.*, payments.method, payments.id AS payment_id, payments2.id
FROM users
JOIN payments
    ON users.id = payments.user_id 
LEFT JOIN payments2
    ON payments.user_id = payments2.user_id
    AND payments.id < payments2.id
WHERE payments2.id IS NULL

To understand how this works, just drop the WHERE payments2.id IS NULL and you'll see what is happening, for instance it could produce the following output (I haven't build the schema to test this, so it's pseudo-output). Assume there are the following records in payments:

要理解这是如何工作的,只需删除WHERE payments2。id为NULL,您将看到正在发生的情况,例如它可能产生以下输出(我还没有构建用于测试这个的模式,所以它是伪输出)。假设有以下付款记录:

id | user_id | method
1  | 1       | VISA
2  | 1       | VISA
3  | 1       | VISA
4  | 1       | VISA

And the above SQL (without the WHERE payments2.id IS NULL clause) should produce:

和上面的SQL(没有WHERE payments2)。id为空子句)应产生:

users.id | payments.method | payments.id | payments2.id
1        | VISA            | 1           | 2
1        | VISA            | 1           | 3
1        | VISA            | 1           | 4
1        | VISA            | 2           | 3
1        | VISA            | 2           | 4
1        | VISA            | 3           | 4
1        | VISA            | 4           | NULL

As you can see the the last line produces the desired result, and since there's no payments2.id > 4, the LEFT JOIN results in a payments2.id = NULL.

正如您所看到的,最后一行产生了所需的结果,因为没有payments2。id > 4,左侧连接结果为payments2。id = NULL。

I've found this solution to be much faster (from my early tests) than the accepted answer.

我发现这个解决方案比公认的答案要快得多(从我早期的测试来看)。

Using a different schema but a similar query, of 16095 records:

使用不同的模式,但使用相似的查询,记录16095:

select as1.*, as2.id
from allocation_status as1
left join allocation_status as2 
    on as1.allocation_id = as2.allocation_id
    and as1.id < as2.id
where as2.id is null;

16095 rows affected, taking 4.1ms

Compared to the accepted answer of MAX / subquery:

与MAX / subquery的公认答案相比:

SELECT as1.* 
FROM allocation_status as1
JOIN (
    SELECT max(id) as id
    FROM allocation_status
    group by allocation_id
) as_max on as1.id = as_max.id 

16095 rows affected, taking 14.8ms