如何在不使用子查询的情况下从连接表中获取计数直方图?

时间:2022-11-07 11:10:25

I have a lot of tables that look like this: (id, user_id, object_id). I am often interested in the question "how many users have one object? how many have two? etc." and would like to see the distribution.

我有很多看起来像这样的表:(id,user_id,object_id)。我经常对“有多少用户有一个对象?有多少有两个?等”的问题感兴趣并希望看到分布。

The obvious answer to this looks like:

对此的明显答案如下:

select x.ucount, count(*) 
from (select count(*) as ucount from objects_users group by user_id) as x 
group by x.ucount 
order by x.ucount;

This produces results like:

这会产生如下结果:

ucount | count
-------|-------
1      | 15
2      | 17
3      | 23
4      | 104
5      | 76
7      | 12

Using a subquery here feels inelegant to me and I'd like to figure out how to produce the same result without. Further, if the question you're trying to ask is slightly more complicated it gets messy passing more information out of the subquery. For example, if you want the data further grouped by the user's creation date:

在这里使用子查询对我来说感觉不优雅,我想知道如何在没有的情况下产生相同的结果。此外,如果您要问的问题稍微复杂一点,那么将更多信息传递出子查询会变得更加混乱。例如,如果您希望按用户的创建日期进一步分组数据:

select 
    x.ucount, 
    (select cdate from users where id = x.user_id) as cdate, 
    count(*) 
from (
    select user_id, count(*) as ucount 
    from objects_users group by user_id
) as x 
group by cdate, x.ucount,  
order by cdate, x.ucount;

Is there some way to avoid the explosion of subqueries? I suppose in the end my objection is aesthetic, but it makes the queries hard to read and hard to write.

有没有办法避免子查询爆炸?我想最终我的反对意见是审美的,但它使查询难以阅读并且难以编写。

1 个解决方案

#1


1  

I think a subquery is exactly the appropriate way to do this, regardless of your RDBMS. Why would it be inelegant?

我认为子查询正是执行此操作的正确方法,无论您的RDBMS如何。它为什么不优雅?

For the second query, just join the users table like this:

对于第二个查询,只需加入users表,如下所示:

SELECT
 x.ucount,
 u.cdate,
 COUNT(*)
FROM (
 SELECT
  user_id,
  COUNT(*) AS ucount
 FROM objects_users
 GROUP BY user_id
) AS x
LEFT JOIN users AS u
 ON x.user_id = u.id
GROUP BY u.cdate, x.ucount
ORDER BY u.cdate, x.ucount

#1


1  

I think a subquery is exactly the appropriate way to do this, regardless of your RDBMS. Why would it be inelegant?

我认为子查询正是执行此操作的正确方法,无论您的RDBMS如何。它为什么不优雅?

For the second query, just join the users table like this:

对于第二个查询,只需加入users表,如下所示:

SELECT
 x.ucount,
 u.cdate,
 COUNT(*)
FROM (
 SELECT
  user_id,
  COUNT(*) AS ucount
 FROM objects_users
 GROUP BY user_id
) AS x
LEFT JOIN users AS u
 ON x.user_id = u.id
GROUP BY u.cdate, x.ucount
ORDER BY u.cdate, x.ucount