Django ORM:优化涉及多对多关系的查询

时间:2022-10-05 13:19:20

I have the following model structure:

我有以下模型结构:

class Container(models.Model):
    pass

class Generic(models.Model):
    name = models.CharacterField(unique=True)
    cont = models.ManyToManyField(Container, null=True)
    # It is possible to have a Generic object not associated with any container, 
    # thats why null=True

class Specific1(Generic):
    ...

class Specific2(Generic):
    ...

...

class SpecificN(Generic):
    ...

Say, I need to retrieve all Specific-type models, that have a relationship with a particular Container.

比如说,我需要检索与特定Container有关系的所有特定类型模型。

The SQL for that is more or less trivial, but that is not the question. Unfortunately, I am not very experienced at working with ORMs (Django's ORM in particular), so I might be missing a pattern here.

对此的SQL或多或少是微不足道的,但这不是问题。不幸的是,我在使用ORM(特别是Django的ORM)方面不是很有经验,所以我可能在这里错过了一个模式。

When done in a brute-force manner, -

当以蛮力的方式完成时, -

c = Container.objects.get(name='somename') # this gets me the container
items = c.generic_set.all() 
# this gets me all Generic objects, that are related to the container
# Now what? I need to get to the actual Specific objects, so I need to somehow
# get the type of the underlying Specific object and get it
for item in items:
    spec = getattr(item, item.get_my_specific_type())

this results in a ton of db hits (one for each Generic record, that relates to a Container), so this is obviously not the way to do it. Now, it could, perhaps, be done by getting the SpecificX objects directly:

这导致了大量的db命中(每个通用记录一个,与Container有关),所以这显然不是这样做的方法。现在,也许可以通过直接获取SpecificX对象来完成:

s = Specific1.objects.filter(cont__name='somename')
# This gets me all Specific1 objects for the specified container
...
# do it for every Specific type

that way the db will be hit once for each Specific type (acceptable, I guess).

这样,对于每种特定类型,db将被命中一次(可以接受,我猜)。

I know, that .select_related() doesn't work with m2m relationships, so it is not of much help here.

我知道,.select_related()不适用于m2m关系,因此在这里没有多大帮助。

To reiterate, the end result has to be a collection of SpecificX objects (not Generic).

重申一下,最终结果必须是SpecificX对象的集合(不是Generic)。

3 个解决方案

#1


2  

I think you've already outlined the two easy possibilities. Either you do a single filter query against Generic and then cast each item to its Specific subtype (results in n+1 queries, where n is the number of items returned), or you make a separate query against each Specific table (results in k queries, where k is the number of Specific types).

我想你已经概述了两个简单的可能性。要么对Generic执行单个过滤器查询,然后将每个项目转换为其特定子类型(导致n + 1个查询,其中n是返回的项目数),或者对每个特定表格进行单独查询(结果为k)查询,其中k是特定类型的数量)。

It's actually worth benchmarking to see which of these is faster in reality. The second seems better because it's (probably) fewer queries, but each one of those queries has to perform a join with the m2m intermediate table. In the former case you only do one join query, and then many simple ones. Some database backends perform better with lots of small queries than fewer, more complex ones.

实际上值得进行基准测试,看看其中哪些更快。第二个似乎更好,因为它(可能)更少的查询,但这些查询中的每一个都必须与m2m中间表执行连接。在前一种情况下,您只执行一个连接查询,然后执行许多简单查询。一些数据库后端在执行大量小查询时表现得比更少,更复杂的查询更好。

If the second is actually significantly faster for your use case, and you're willing to do some extra work to clean up your code, it should be possible to write a custom manager method for the Generic model that "pre-fetches" all the subtype data from the relevant Specific tables for a given queryset, using only one query per subtype table; similar to how this snippet optimizes generic foreign keys with a bulk prefetch. This would give you the same queries as your second option, with the DRYer syntax of your first option.

如果第二个实际上对你的用例来说明显更快,并且你愿意做一些额外的工作来清理你的代码,那么应该可以为Generic模型编写一个自定义管理器方法来“预取”所有的来自给定查询集的相关特定表的子类型数据,每个子类型表只使用一个查询;类似于此代码段如何使用批量预取优化通用外键。这将为您提供与第二个选项相同的查询,使用您的第一个选项的DRYer语法。

#2


1  

Not a complete answer but you can avoid a great number of hits by doing this

这不是一个完整的答案,但你可以避免大量的点击

items= list(items)
for item in items:
    spec = getattr(item, item.get_my_specific_type())

instead of this :

而不是这个:

for item in items:
    spec = getattr(item, item.get_my_specific_type())

Indeed, by forcing a cast to a python list, you force the django orm to load all elements in your queryset. It then does this in one query.

实际上,通过强制转换为python列表,可以强制django orm加载查询集中的所有元素。然后它在一个查询中执行此操作。

#3


0  

I accidentally stubmled upon the following post, which pretty much answers your question :

我不小心碰到了以下帖子,这几乎回答了你的问题:

http://lazypython.blogspot.com/2008/11/timeline-view-in-django.html

#1


2  

I think you've already outlined the two easy possibilities. Either you do a single filter query against Generic and then cast each item to its Specific subtype (results in n+1 queries, where n is the number of items returned), or you make a separate query against each Specific table (results in k queries, where k is the number of Specific types).

我想你已经概述了两个简单的可能性。要么对Generic执行单个过滤器查询,然后将每个项目转换为其特定子类型(导致n + 1个查询,其中n是返回的项目数),或者对每个特定表格进行单独查询(结果为k)查询,其中k是特定类型的数量)。

It's actually worth benchmarking to see which of these is faster in reality. The second seems better because it's (probably) fewer queries, but each one of those queries has to perform a join with the m2m intermediate table. In the former case you only do one join query, and then many simple ones. Some database backends perform better with lots of small queries than fewer, more complex ones.

实际上值得进行基准测试,看看其中哪些更快。第二个似乎更好,因为它(可能)更少的查询,但这些查询中的每一个都必须与m2m中间表执行连接。在前一种情况下,您只执行一个连接查询,然后执行许多简单查询。一些数据库后端在执行大量小查询时表现得比更少,更复杂的查询更好。

If the second is actually significantly faster for your use case, and you're willing to do some extra work to clean up your code, it should be possible to write a custom manager method for the Generic model that "pre-fetches" all the subtype data from the relevant Specific tables for a given queryset, using only one query per subtype table; similar to how this snippet optimizes generic foreign keys with a bulk prefetch. This would give you the same queries as your second option, with the DRYer syntax of your first option.

如果第二个实际上对你的用例来说明显更快,并且你愿意做一些额外的工作来清理你的代码,那么应该可以为Generic模型编写一个自定义管理器方法来“预取”所有的来自给定查询集的相关特定表的子类型数据,每个子类型表只使用一个查询;类似于此代码段如何使用批量预取优化通用外键。这将为您提供与第二个选项相同的查询,使用您的第一个选项的DRYer语法。

#2


1  

Not a complete answer but you can avoid a great number of hits by doing this

这不是一个完整的答案,但你可以避免大量的点击

items= list(items)
for item in items:
    spec = getattr(item, item.get_my_specific_type())

instead of this :

而不是这个:

for item in items:
    spec = getattr(item, item.get_my_specific_type())

Indeed, by forcing a cast to a python list, you force the django orm to load all elements in your queryset. It then does this in one query.

实际上,通过强制转换为python列表,可以强制django orm加载查询集中的所有元素。然后它在一个查询中执行此操作。

#3


0  

I accidentally stubmled upon the following post, which pretty much answers your question :

我不小心碰到了以下帖子,这几乎回答了你的问题:

http://lazypython.blogspot.com/2008/11/timeline-view-in-django.html