查询优化递归多对多查询

时间:2022-02-16 20:14:03

I'm using a small tree/graph package (django_dag) that has that give my model a Many-to-many children field which refers to itself. The basic structure can be shown as the following models

我正在使用一个小树/图形包(django_dag),它给我的模型一个多对多的子字段引用它自己。基本结构可以显示为以下模型

#models
class Foo(FooBase):
    class Meta:
        abstract = True

    children = models.ManyToManyField('self', symmetrical = False,
                                      through = Bar) 

class Bar():
    parent = models.ForeignKey(Foo)
    child = models.ForeignKey(Foo)

All is fine with the models and all the functionality of the package. FooBase adds a variety of functions to the model, including a way of recursively finding all children of a Foo and the children's children and so forth.

一切都很好,模型和包的所有功能。 FooBase为模型添加了各种功能,包括递归查找Foo的所有子项和子项的子项等等。

My concern is with the following function within FooBase:

我关心的是FooBase中的以下功能:

def descendants_tree(self):
    tree = {}
    for f in self.children.all():
        tree[f] = f.descendants_tree()
    return tree

It outputs something like {Foo1:{}, Foo2: {Child_of_Foo2: {Child_of_Child_of_Foo2:{}}}} where the progeny are in a nested dictionary.

它输出类似{Foo1:{},Foo2:{Child_of_Foo2:{Child_of_Child_of_Foo2:{}}}}的内容,其中后代位于嵌套字典中。

The alert reader may notice that this method calls a new query for each child. While these db hits are pretty quick, they can add up quickly when there might be 50+ children. And eventually, there will be tens of thousands of db entries. Right now, each query averages 0.6 msec with a row count of almost 2000.

警报阅读器可能会注意到此方法为每个子项调用新查询。虽然这些db命中很快,但是当可能有50个以上的孩子时,它们可以快速加起来。最终,将有数万个db条目。现在,每个查询的平均值为0.6毫秒,行数几乎为2000。

Is there a more efficient way of doing this nested query?

有没有更有效的方法来执行此嵌套查询?

In my mind, doing a select_related().all() beforehand would get it down to one query but that smells like trouble in the future. At what point is one large query better or worse than many small ones?

在我看来,事先做一个select_related()。all()会把它归结为一个查询但是将来会有点麻烦。一个大的查询在多大程度上比许多小查询更好或更差?

---Edit---

Here's what I'm trying to test the select_related().all() option with, but it's still hitting every iteration:

这是我正在尝试测试select_related()。all()选项,但它仍然在每次迭代时都会遇到:

all_foo = Foo.objects.select_related('children').all()
def loop(baz):
    tree = {}
    for f in all_foo.get(id = baz).children.all()
        tree[f] = loop(f)
    return tree

I assume the children.all() is causing the hit. Is there another way to get all of Many-to-Many children without using the callable attribute?

我假设children.all()导致命中。是否有另一种方法可以在不使用可调用属性的情况下获取所有多对多孩子?

1 个解决方案

#1


1  

You'll have to test under your own environment with your own circumstances. select_related is generally always recommended, but in cases where there will be many recursive levels, that one large query is generally slower than the multiple queries.

您必须根据自己的情况在自己的环境中进行测试。通常始终建议使用select_related,但是在存在许多递归级别的情况下,一个大型查询通常比多个查询慢。

The amount of children doesn't really matter, the levels of recursion is what matters most. If you're doing 3 or so, select_related() might be better, but much more than that would likely result in a slow down. The plugin author likely did it this way to allow for many, many levels of recursion, because it only really hurts when there's just a few, and that's only a few extra queries.

孩子的数量并不重要,递归的水平是最重要的。如果你做3个左右,select_related()可能会更好,但远远超过这个可能会导致速度减慢。插件作者可能这样做是为了允许多个级别的递归,因为只有少数几个才会真正受到伤害,而这只是一些额外的查询。

#1


1  

You'll have to test under your own environment with your own circumstances. select_related is generally always recommended, but in cases where there will be many recursive levels, that one large query is generally slower than the multiple queries.

您必须根据自己的情况在自己的环境中进行测试。通常始终建议使用select_related,但是在存在许多递归级别的情况下,一个大型查询通常比多个查询慢。

The amount of children doesn't really matter, the levels of recursion is what matters most. If you're doing 3 or so, select_related() might be better, but much more than that would likely result in a slow down. The plugin author likely did it this way to allow for many, many levels of recursion, because it only really hurts when there's just a few, and that's only a few extra queries.

孩子的数量并不重要,递归的水平是最重要的。如果你做3个左右,select_related()可能会更好,但远远超过这个可能会导致速度减慢。插件作者可能这样做是为了允许多个级别的递归,因为只有少数几个才会真正受到伤害,而这只是一些额外的查询。