我对于distinct()如何处理Django查询感到困惑

时间:2022-10-04 14:10:49

I have this query:

我有这个问题:

checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)

Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:

哪个适用于告诉我在特定地点的日期范围内发生了多少次签到。但我想知道有多少签到是由独特的用户完成的。所以我尝试了这个:

checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()

But that doesn't work, I get back an empty Array. Any ideas why?

但这不起作用,我得到一个空数组。有什么想法吗?

Here is my CheckinAct model:

这是我的CheckinAct模型:

class CheckinAct(models.Model):
    user = models.ForeignKey(User)
    location = models.ForeignKey(Location)
    time = models.DateTimeField()

----Update------ So now I have updated my query to look like this:

----更新------所以现在我更新了我的查询,如下所示:

 checkins = CheckinAct.objects.values('user').\
                            filter(time__range=[start, end], location=checkin.location).\
                            annotate(dcount=Count('user'))

But I'm still getting multiple objects back that have the same user, like so:

但我仍然得到具有相同用户的多个对象,如下所示:

 [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]

---- Update 2------ Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.

----更新2 ------这是我尝试过的其他内容,但是当我记录checkins对象时,我仍然会收到许多相同的用户对象。

checkins = CheckinAct.objects.filter(
                    time__range=[start, end],
                    location=checkin.location,
                ).annotate(dcount=Count('user')).values('user', 'dcount')
        logger.info("checkins!!! : " + str(checkins))

Logs the following:

记录以下内容:

checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]

Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.

注意同一个用户对象有3个实例。这是否正常工作?是否有不同的方法来读出dict对象中的内容?我只需要知道有多少唯一身份用户在该时间范围内检查该特定位置。

3 个解决方案

#1


1  

The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)

答案实际上是在Django文档中。不幸的是,很少关注你需要的特定部分的重要性;所以这是可以理解的错过了。 (稍微阅读与物品有关的部分。)

For your use-case, the following should give you exactly what you want:

对于您的用例,以下内容应该为您提供您想要的内容:

checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
                              values('user').annotate(checkin_count=Count('pk')).order_by()

UPDATE

Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):

根据你的评论,我认为你想要实现的问题一直困惑。上面的查询给出的是每个用户在某个位置签到的次数列表,该列表中没有重复的用户。现在看来,您真正想要的是在一个特定位置签入的唯一身份用户的数量。为此,请使用以下内容(无论如何都要简单得多):

User.objects.filter(checkinat__location=location).distinct().count()

UPDATE for non-rel support

更新非rel支持

checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))

This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.

这符合dicts具有唯一键的原则。因此,当您将元组列表转换为dict时,最终会得到一个唯一用户列表。但是,这将生成1 * N个查询,其中N是签到的总量(每次使用用户属性时一个查询。通常,我会执行类似.select_related('user')的操作,但这也需要JOIN,显然已经出局了。如果没有,那么JOIN不受支持似乎是非相关的巨大缺点,但如果是这样的话,这将是你唯一的选择。

#2


0  

You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.

你不想要DISTINCT。你真的希望Django做一些最终会给你一个GROUP BY子句的东西。你也是正确的,你的最终解决方案是结合annotate()和values(),如Django文档中所讨论的那样。

What you want to do to get your results is to use annotate first, and then values, such as:

您想要获得结果的目的是首先使用注释,然后使用值,例如:

CheckinAct.objects.filter(
    time__range=[start, end],
    location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')

The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).

我在上面给你的链接上的Django文档显示了一个类似构造的查询(减去过滤器方面,我在适当的位置添加了你的情况),并注意这将“现在为每个[checkin act]产生一个唯一的结果;但是,只有[user]和[dcount]注释将在输出数据中返回“。 (我编辑了句子以适合你的情况,但原则是一样的)。

Hope that helps!

希望有所帮助!

#3


0  

checkins = CheckinAct.objects.values('user').\
                        filter(time__range=[start, end], location=checkin.location).\
                        annotate(dcount=Count('user'))

If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?

如果我没有弄错,你想要的值不会作为“dcount”输入吗?因此,当您决定单独输出用户值时,是不是只是被丢弃了?

Can you tell me what happens when you try this?

你能告诉我你试试看会发生什么吗?

checkins = CheckinAct.objects.values('user').\
                        filter(time__range=[start, end], location=checkin.location).\
                        annotate(Count('user')).order_by()

(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

(最后一个order_by是清除你在模型级别可能已经拥有的任何内置顺序 - 不确定你是否有类似的东西,但是不要伤害问...)

#1


1  

The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)

答案实际上是在Django文档中。不幸的是,很少关注你需要的特定部分的重要性;所以这是可以理解的错过了。 (稍微阅读与物品有关的部分。)

For your use-case, the following should give you exactly what you want:

对于您的用例,以下内容应该为您提供您想要的内容:

checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
                              values('user').annotate(checkin_count=Count('pk')).order_by()

UPDATE

Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):

根据你的评论,我认为你想要实现的问题一直困惑。上面的查询给出的是每个用户在某个位置签到的次数列表,该列表中没有重复的用户。现在看来,您真正想要的是在一个特定位置签入的唯一身份用户的数量。为此,请使用以下内容(无论如何都要简单得多):

User.objects.filter(checkinat__location=location).distinct().count()

UPDATE for non-rel support

更新非rel支持

checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))

This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.

这符合dicts具有唯一键的原则。因此,当您将元组列表转换为dict时,最终会得到一个唯一用户列表。但是,这将生成1 * N个查询,其中N是签到的总量(每次使用用户属性时一个查询。通常,我会执行类似.select_related('user')的操作,但这也需要JOIN,显然已经出局了。如果没有,那么JOIN不受支持似乎是非相关的巨大缺点,但如果是这样的话,这将是你唯一的选择。

#2


0  

You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.

你不想要DISTINCT。你真的希望Django做一些最终会给你一个GROUP BY子句的东西。你也是正确的,你的最终解决方案是结合annotate()和values(),如Django文档中所讨论的那样。

What you want to do to get your results is to use annotate first, and then values, such as:

您想要获得结果的目的是首先使用注释,然后使用值,例如:

CheckinAct.objects.filter(
    time__range=[start, end],
    location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')

The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).

我在上面给你的链接上的Django文档显示了一个类似构造的查询(减去过滤器方面,我在适当的位置添加了你的情况),并注意这将“现在为每个[checkin act]产生一个唯一的结果;但是,只有[user]和[dcount]注释将在输出数据中返回“。 (我编辑了句子以适合你的情况,但原则是一样的)。

Hope that helps!

希望有所帮助!

#3


0  

checkins = CheckinAct.objects.values('user').\
                        filter(time__range=[start, end], location=checkin.location).\
                        annotate(dcount=Count('user'))

If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?

如果我没有弄错,你想要的值不会作为“dcount”输入吗?因此,当您决定单独输出用户值时,是不是只是被丢弃了?

Can you tell me what happens when you try this?

你能告诉我你试试看会发生什么吗?

checkins = CheckinAct.objects.values('user').\
                        filter(time__range=[start, end], location=checkin.location).\
                        annotate(Count('user')).order_by()

(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

(最后一个order_by是清除你在模型级别可能已经拥有的任何内置顺序 - 不确定你是否有类似的东西,但是不要伤害问...)