在Django中针对另一个相关模型的M2M关系过滤相关字段

时间:2022-06-27 07:12:07

So I have a booking system. Agents (the people and organisations submitting bookings) are only allowed to make booking in the categories we assign them. Many agents can assign to the same categories. It's a simple many-to-many. Here's an idea of what the models look like:

所以我有一个预订系统。代理商(提交预订的人员和组织)仅允许在我们分配的类别中进行预订。许多代理可以分配到相同的类别。这是一个简单的多对多。这里是模型的概念:

class Category(models.Model):
    pass

class Agent(models.Model):
    categories = models.ManyToManyField('Category')

class Booking(models.Model):
    agent = models.ForeignKey('Agent')
    category = models.ForeignKey('Category')

So when a booking comes in, we dynamically allocate the category based on which are available to the agent. The agent usually doesn't specify.

因此,当预订进入时,我们会根据代理商可用的类别动态分配类别。代理通常不指定。

Can I select Bookings where Booking.category isn't in Booking.agent.categories?

We have just noticed that —by the grace of a silly admin mistake— some agents were allowed submit Bookings to any category. It has left us with thousands of bookings in the wrong place.

我们刚刚注意到 - 由于一个愚蠢的管理员错误的优雅 - 一些代理人被允许提交任何类别的预订。它让我们在错误的地方有成千上万的预订。

I can fix this but the I can only get it to work by nesting lookups:

我可以解决这个问题,但我只能通过嵌套查找来实现它:

for agent in Agent.objects.all():
    for booking in Booking.objects.filter(agent=agent):
        if booking.category not in agent.categories.all():
            # go through the automated allocation logic again

This works but it's super-slow. It's a lot of data flying between database and Django. This isn't a one-off either. I want to periodically audit new bookings to make sure they are in the correct place. It doesn't seem impossible that another admin issue will occur so after checking the Agent database, I want to query for Bookings that aren't in their agent's categories.

这可行,但它超级慢。这是数据库和Django之间的大量数据。这也不是一次性的。我想定期审核新的预订,以确保它们在正确的位置。在检查代理数据库之后,似乎不会发生另一个管理问题,我想查询不在其代理的类别中的预订。

Again, nested queries will work not but as our datasets grow into millions (and beyond) I'd like to do this more efficiently..

同样,嵌套查询不会起作用,但随着我们的数据集增长到数百万(甚至更多),我想更有效地做到这一点。

I feel like it should be possible to do this with a F() lookup, something like this:

我觉得应该可以使用F()查找来执行此操作,如下所示:

from django.db.models import F
bad = Booking.objects.exclude(category__in=F('agent__categories'))

But this doesn't work: TypeError: 'Col' object is not iterable

但这不起作用:TypeError:'Col'对象不可迭代

I've also tried .exclude(category=F('agent__categories')) and while it's happier with the syntax there, it doesn't exclude the "correct" bookings.

我也尝试过.exclude(category = F('agent__categories'))虽然它对语法更加满意,但它并没有排除“正确”的预订。

What's the secret formula for doing this sort of F() query on a M2M?

在M2M上进行这种F()查询的秘密公式是什么?


To help nail down exactly what I'm after I've set up a Github repo with these models (and some data). Please use them to write the query. The current sole answer hits and issue I was seeing on my "real" data too.

为了帮助确定我在使用这些模型(和一些数据)设置Github仓库后的确切内容。请用它们来编写查询。目前唯一的回答点击和问题我也看到了我的“真实”数据。

git clone https://github.com/oliwarner/djangorelquerytest.git
cd djangorelquerytest
python3 -m venv venv
. ./venv/bin/activate
pip install ipython Django==1.9a1

./manage.py migrate
./manage.py shell

And in the shell, fire in:

在壳中,火:

from django.db.models import F
from querytest.models import Category, Agent, Booking
Booking.objects.exclude(agent__categories=F('category'))

Is that a bug? Is there a proper way to achieve this?

那是一个错误吗?有没有正确的方法来实现这一目标?

6 个解决方案

#1


6  

There is a chance that I might be wrong, but I think doing it in reverse should do the trick:

我有可能错了,但我认为反过来应该这样做:

bad = Booking.objects.exclude(agent__categories=F('category'))

bad = Booking.objects.exclude(agent__categories = F('category'))

Edit

编辑

If above won't work, here is another idea. I've tried similar logic on the setup I have and it seems to work. Try adding an intermediate model for ManyToManyField:

如果上面不起作用,这是另一个想法。我在设置上尝试了类似的逻辑,似乎有效。尝试为ManyToManyField添加中间模型:

class Category(models.Model):
    pass

class Agent(models.Model):
    categories = models.ManyToManyField('Category', through='AgentCategory')

class AgentCategory(models.Model):
    agent = models.ForeignKey(Agent, related_name='agent_category_set')
    category = models.ForeignKey(Category, related_name='agent_category_set')

class Booking(models.Model):
    agent = models.ForeignKey('Agent')
    category = models.ForeignKey('Category')

Then you can do a query:

然后你可以做一个查询:

bad = Booking.objects.exclude(agent_category_set__category=F('category'))

Of course specifying an intermediate model has it's own implications, but I am sure you can handle them.

当然,指定一个中间模型有它自己的含义,但我相信你可以处理它们。

#2


1  

Usually when dealing with m2m relationships I take the hybrid approach. I would break the problem into two parts, a python and sql part. I find this speeds up the query a lot and it doesn't required any complicated query.

通常在处理m2m关系时,我采用混合方法。我会把问题分成两部分,一个python和sql部分。我发现这会加快查询速度,并且不需要任何复杂的查询。

The first thing you want to do is get the agent to categories mapping, then use that mapping to determine the category that is not in the assignment.

您要做的第一件事是将代理程序转换为类别映射,然后使用该映射来确定不在分配中的类别。

def get_agent_to_cats():
    # output { agent_id1: [ cat_id1, cat_id2, ], agent_id2: [] }
    result = defaultdict(list)

    # get the relation using the "through" model, it is more efficient
    # this is the Agent.categories mapping
    for rel in Agent.categories.through.objects.all():
        result[rel.agent_id].append(rel.category_id)
    return result


def find_bad_bookings(request):
    agent_to_cats = get_agent_to_cats()

    for (agent_id, cats) in agent_to_cats.items():
        # this will get all the bookings that NOT belong to the agent's category assignments
        bad_bookings = Booking.objects.filter(agent_id=agent_id)
                                         .exclude(category_id__in=cats)

        # at this point you can do whatever you want to the list of bad bookings
        bad_bookings.update(wrong_cat=True)            

    return HttpResponse('Bad Bookings: %s' % Booking.objects.filter(wrong_cat=True).count())

Here is a little stats when I ran the test on my server: 10,000 Agents 500 Categories 2,479,839 Agent to Category Assignments 5,000,000 Bookings

以下是我在服务器上运行测试时的一些统计数据:10,000个代理商500类别2,479,839代理商到类别分配5,000,000个预订

2,509,161 Bad Bookings. Total duration 149 seconds

2,509,161坏预订。总持续时间149秒

#3


1  

Solution 1:

解决方案1:

You can find the good bookings using this query

您可以使用此查询找到好的预订

good = Booking.objects.filter(category=F('agent__categories'))

You can check the sql query for this

您可以检查sql查询

print Booking.objects.filter(category=F('agent__categories')).query

So you can exclude the good bookings from all bookings. Solution is :

因此,您可以从所有预订中排除好预订。解决方案是:

Booking.objects.exclude(id__in=Booking.objects.filter(category=F('agent__categories')).values('id'))

It will create a MySql nested query which is the most optimized MySql query for this problem ( as far as i know ).

它将创建一个MySql嵌套查询,这是针对此问题的最优化的MySql查询(据我所知)。

This MySql query will be a little heavy as you database is huge but it will hit database only once instead of your first attempt of loops which will hit for bookings * agent_categories times.

这个MySql查询会有点沉重,因为你的数据库很大但是它只会打到数据库一次,而不是你第一次尝试循环,这将导致预订* agent_categories次。

Also, you can make the dataset less by using filtering on date if you are storing those and you have approximation when the wrong booking started.

此外,如果您要存储这些数据,则可以通过使用日期过滤来减少数据集,并且在错误预订开始时您已经接近了。

You can use the above command periodically to check for inconsistent bookings. But i would recommend to over ride the admin form and check while booking if category is correct or not. Also you can use some javascript to add only the categories in admin form which are present for selected/logged-in agent at that time.

您可以定期使用上述命令检查是否存在不一致的预订。但我会建议过度使用管理员表格,并在预订时检查类别是否正确。此外,您可以使用某些javascript仅添加管理员表单中的类别,这些类别在当时为选定/登录代理程序提供。

Solution 2:

解决方案2:

use prefetch_related, this will reduce your time drastically because very less database hits.

使用prefetch_related,这将大大减少您的时间,因为数据库命中率非常低。

read about it here : https://docs.djangoproject.com/en/1.8/ref/models/querysets/

在这里阅读:https://docs.djangoproject.com/en/1.8/ref/models/querysets/

for agent in Agent.objects.all().prefetch_related('bookings, categories'):
    for booking in Booking.objects.filter(agent=agent):
        if booking.category not in agent.categories.all():

#4


0  

This might speed it up ...

这可能加快它...

for agent in Agent.objects.iterator():
    agent_categories = agent.categories.all()
    for booking in agent.bookings.iterator():
        if booking.category not in agent_categories:
            # go through the automated allocation logic again

#5


0  

This may not be what you're looking for, but you can use a raw query. I don't know if it can be done entirely within the ORM, but this works in your github repo:

这可能不是您要找的,但您可以使用原始查询。我不知道它是否可以完全在ORM中完成,但这适用于你的github repo:

Booking.objects.raw("SELECT id \
                     FROM querytest_booking as booking \
                     WHERE category_id NOT IN ( \
                         SELECT category_id \
                         FROM querytest_agent_categories as agent_cats \
                         WHERE agent_cats.agent_id = booking.agent_id);")

I assume the table names will be different for you, unless your app is called querytest. But either way, this can be iterated over for you to plug your custom logic into.

我假设你的表名不同,除非你的app叫做querytest。但无论哪种方式,都可以迭代,以便将自定义逻辑插入。

#6


0  

You were almost there. First, let's create two booking elements:

你快到了。首先,让我们创建两个预订元素:

# b1 has a "correct" agent
b1 = Booking.objects.create(agent=Agent.objects.create(), category=Category.objects.create())
b1.agent.categories.add(b1.category)

# b2 has an incorrect agent
b2 = Booking.objects.create(agent=Agent.objects.create(), category=Category.objects.create())

Here is the queryset of all incorrect bookings (i.e: [b2]):

这是所有不正确预订的查询集(即:[b2]):

# The following requires a single query because
# the Django ORM is pretty smart
[b.id for b in Booking.objects.exclude(
    id__in=Booking.objects.filter(
        category__in=F('agent__categories')
    )
)]
[2]

Note that in my experience the following query does not produce any error but for some unknown reason the result is not correct either:

请注意,根据我的经验,以下查询不会产生任何错误,但由于某些未知原因,结果也不正确:

Booking.objects.exclude(category__in=F('agent__categories'))
[]

#1


6  

There is a chance that I might be wrong, but I think doing it in reverse should do the trick:

我有可能错了,但我认为反过来应该这样做:

bad = Booking.objects.exclude(agent__categories=F('category'))

bad = Booking.objects.exclude(agent__categories = F('category'))

Edit

编辑

If above won't work, here is another idea. I've tried similar logic on the setup I have and it seems to work. Try adding an intermediate model for ManyToManyField:

如果上面不起作用,这是另一个想法。我在设置上尝试了类似的逻辑,似乎有效。尝试为ManyToManyField添加中间模型:

class Category(models.Model):
    pass

class Agent(models.Model):
    categories = models.ManyToManyField('Category', through='AgentCategory')

class AgentCategory(models.Model):
    agent = models.ForeignKey(Agent, related_name='agent_category_set')
    category = models.ForeignKey(Category, related_name='agent_category_set')

class Booking(models.Model):
    agent = models.ForeignKey('Agent')
    category = models.ForeignKey('Category')

Then you can do a query:

然后你可以做一个查询:

bad = Booking.objects.exclude(agent_category_set__category=F('category'))

Of course specifying an intermediate model has it's own implications, but I am sure you can handle them.

当然,指定一个中间模型有它自己的含义,但我相信你可以处理它们。

#2


1  

Usually when dealing with m2m relationships I take the hybrid approach. I would break the problem into two parts, a python and sql part. I find this speeds up the query a lot and it doesn't required any complicated query.

通常在处理m2m关系时,我采用混合方法。我会把问题分成两部分,一个python和sql部分。我发现这会加快查询速度,并且不需要任何复杂的查询。

The first thing you want to do is get the agent to categories mapping, then use that mapping to determine the category that is not in the assignment.

您要做的第一件事是将代理程序转换为类别映射,然后使用该映射来确定不在分配中的类别。

def get_agent_to_cats():
    # output { agent_id1: [ cat_id1, cat_id2, ], agent_id2: [] }
    result = defaultdict(list)

    # get the relation using the "through" model, it is more efficient
    # this is the Agent.categories mapping
    for rel in Agent.categories.through.objects.all():
        result[rel.agent_id].append(rel.category_id)
    return result


def find_bad_bookings(request):
    agent_to_cats = get_agent_to_cats()

    for (agent_id, cats) in agent_to_cats.items():
        # this will get all the bookings that NOT belong to the agent's category assignments
        bad_bookings = Booking.objects.filter(agent_id=agent_id)
                                         .exclude(category_id__in=cats)

        # at this point you can do whatever you want to the list of bad bookings
        bad_bookings.update(wrong_cat=True)            

    return HttpResponse('Bad Bookings: %s' % Booking.objects.filter(wrong_cat=True).count())

Here is a little stats when I ran the test on my server: 10,000 Agents 500 Categories 2,479,839 Agent to Category Assignments 5,000,000 Bookings

以下是我在服务器上运行测试时的一些统计数据:10,000个代理商500类别2,479,839代理商到类别分配5,000,000个预订

2,509,161 Bad Bookings. Total duration 149 seconds

2,509,161坏预订。总持续时间149秒

#3


1  

Solution 1:

解决方案1:

You can find the good bookings using this query

您可以使用此查询找到好的预订

good = Booking.objects.filter(category=F('agent__categories'))

You can check the sql query for this

您可以检查sql查询

print Booking.objects.filter(category=F('agent__categories')).query

So you can exclude the good bookings from all bookings. Solution is :

因此,您可以从所有预订中排除好预订。解决方案是:

Booking.objects.exclude(id__in=Booking.objects.filter(category=F('agent__categories')).values('id'))

It will create a MySql nested query which is the most optimized MySql query for this problem ( as far as i know ).

它将创建一个MySql嵌套查询,这是针对此问题的最优化的MySql查询(据我所知)。

This MySql query will be a little heavy as you database is huge but it will hit database only once instead of your first attempt of loops which will hit for bookings * agent_categories times.

这个MySql查询会有点沉重,因为你的数据库很大但是它只会打到数据库一次,而不是你第一次尝试循环,这将导致预订* agent_categories次。

Also, you can make the dataset less by using filtering on date if you are storing those and you have approximation when the wrong booking started.

此外,如果您要存储这些数据,则可以通过使用日期过滤来减少数据集,并且在错误预订开始时您已经接近了。

You can use the above command periodically to check for inconsistent bookings. But i would recommend to over ride the admin form and check while booking if category is correct or not. Also you can use some javascript to add only the categories in admin form which are present for selected/logged-in agent at that time.

您可以定期使用上述命令检查是否存在不一致的预订。但我会建议过度使用管理员表格,并在预订时检查类别是否正确。此外,您可以使用某些javascript仅添加管理员表单中的类别,这些类别在当时为选定/登录代理程序提供。

Solution 2:

解决方案2:

use prefetch_related, this will reduce your time drastically because very less database hits.

使用prefetch_related,这将大大减少您的时间,因为数据库命中率非常低。

read about it here : https://docs.djangoproject.com/en/1.8/ref/models/querysets/

在这里阅读:https://docs.djangoproject.com/en/1.8/ref/models/querysets/

for agent in Agent.objects.all().prefetch_related('bookings, categories'):
    for booking in Booking.objects.filter(agent=agent):
        if booking.category not in agent.categories.all():

#4


0  

This might speed it up ...

这可能加快它...

for agent in Agent.objects.iterator():
    agent_categories = agent.categories.all()
    for booking in agent.bookings.iterator():
        if booking.category not in agent_categories:
            # go through the automated allocation logic again

#5


0  

This may not be what you're looking for, but you can use a raw query. I don't know if it can be done entirely within the ORM, but this works in your github repo:

这可能不是您要找的,但您可以使用原始查询。我不知道它是否可以完全在ORM中完成,但这适用于你的github repo:

Booking.objects.raw("SELECT id \
                     FROM querytest_booking as booking \
                     WHERE category_id NOT IN ( \
                         SELECT category_id \
                         FROM querytest_agent_categories as agent_cats \
                         WHERE agent_cats.agent_id = booking.agent_id);")

I assume the table names will be different for you, unless your app is called querytest. But either way, this can be iterated over for you to plug your custom logic into.

我假设你的表名不同,除非你的app叫做querytest。但无论哪种方式,都可以迭代,以便将自定义逻辑插入。

#6


0  

You were almost there. First, let's create two booking elements:

你快到了。首先,让我们创建两个预订元素:

# b1 has a "correct" agent
b1 = Booking.objects.create(agent=Agent.objects.create(), category=Category.objects.create())
b1.agent.categories.add(b1.category)

# b2 has an incorrect agent
b2 = Booking.objects.create(agent=Agent.objects.create(), category=Category.objects.create())

Here is the queryset of all incorrect bookings (i.e: [b2]):

这是所有不正确预订的查询集(即:[b2]):

# The following requires a single query because
# the Django ORM is pretty smart
[b.id for b in Booking.objects.exclude(
    id__in=Booking.objects.filter(
        category__in=F('agent__categories')
    )
)]
[2]

Note that in my experience the following query does not produce any error but for some unknown reason the result is not correct either:

请注意,根据我的经验,以下查询不会产生任何错误,但由于某些未知原因,结果也不正确:

Booking.objects.exclude(category__in=F('agent__categories'))
[]