MySQL - 处理这种分层数据的最佳方法?

时间:2022-02-10 16:38:22

This is a followup to:
MySQL - Is it possible to get all sub-items in a hierarchy?

这是对以下内容的跟进:MySQL - 是否可以获取层次结构中的所有子项?

I have an arbitrary-depth adjacency list model table (I am at the point that I can convert it into a nested set model.

我有一个任意深度的邻接列表模型表(我可以将它转换为嵌套集模型。

I read the MySQL data on how to use a nested set model, though it seemed to get increasingly complex and very complex to do basic functions such as inserting, updating and deleting.

我阅读了有关如何使用嵌套集模型的MySQL数据,尽管它似乎变得越来越复杂,并且非常复杂,无法执行插入,更新和删除等基本功能。

Another blog showing how to use a trigger system with the adjacency list model to keep a table of ancestors that relates each object to its ancestors.

另一篇博客展示了如何使用具有邻接列表模型的触发器系统来保持将每个对象与其祖先相关联的祖先表。


Right now I need to be able to return a list of all children of a given node, to change or delete them. This hierarchical structure won't be changing all the time once created, but there will be a mass amount of the hierarchical structures.

现在我需要能够返回给定节点的所有子节点的列表,以更改或删除它们。这种层次结构一旦创建就不会一直在变化,但会有大量的层次结构。

The three methods I see are:

我看到的三种方法是:

  1. Created a Stored Procedure which would do a recursive query that returns all children.

    创建了一个存储过程,它将执行一个返回所有子节点的递归查询。

  2. Convert to Nested Set Model which would require to get into the complexities and possibly create a stored procedure to add, edit and delete in that.

    转换为嵌套集模型,这需要进入复杂性并可能创建一个存储过程来添加,编辑和删除。

  3. Create the Ancestor Table described above on insert/delete triggers to handle all of the data.

    在插入/删除触发器上创建上述Ancestor表以处理所有数据。

If there are other methods I'm not exploring, please let me know and I'll update this list.

如果还有其他方法我没有探索,请告诉我,我会更新此列表。

5 个解决方案

#1


4  

Quassnoi has run some performance tests on the nested sets model and the adjacency list model and documented the results and recommendations in his blog post Adjacency list vs. nested sets: MySQL. The executive summary is:

Quassnoi已经对嵌套集模型和邻接列表模型进行了一些性能测试,并在他的博客文章“邻接列表与嵌套集:MySQL”中记录了结果和建议。执行摘要是:

  • Nested sets is faster for fetching all child nodes or all parent nodes.
  • 对于获取所有子节点或所有父节点,嵌套集更快。
  • Nested sets is a bad idea if you frequently need to update the table.
  • 如果您经常需要更新表,嵌套集是个坏主意。

Here is the conclusion from his article:

以下是他的文章的结论:

In MySQL, the nested sets model should be preferred if the updates to the hierarhical structure are infrequent and it is affordable to lock the table for the duration of an update (which can take minutes on a long table).

在MySQL中,如果对hierarhical结构的更新很少,并且在更新期间锁定表(在长表上可能需要几分钟),则可以使用嵌套集模型。

This implies creating the table using MyISAM storage engine, creating the bounding box of a GEOMETRY type as described above, indexing it with a SPATIAL index and persisting the level in the table.

这意味着使用MyISAM存储引擎创建表,创建如上所述的GEOMETRY类型的边界框,使用SPATIAL索引对其进行索引并在表中保持级别。

If the updates to the table are frequent or it is inaffordable to lock the table for a long period of time implied by an update, then the adjacency list model should be used to store the hierarchical data.

如果对表的更新频繁或者在更新所暗示的长时间内锁定表是不合理的,那么应该使用邻接列表模型来存储分层数据。

This requires creating a function to query the table.

这需要创建一个查询表的函数。

The rest of the article shows how to define the table, implement the queries and gives performance measurements. The use of the spatial index is a clever idea to improve the performance of the nested set model that might be new to you.

本文的其余部分将介绍如何定义表,实现查询以及提供性能测量。使用空间索引是一个聪明的想法,可以提高可能对您来说不熟悉的嵌套集模型的性能。


If you're also considering approaches without MySQL then you might want to look at PostgreSQL which is another free and open-source database. PostgreSQL supports recursive queries in the form of recursive common table expressions which make querying heirarchical data easier than in MySQL and also give better performance. Quassnoi has also written an article Adjacency list vs. nested sets: PostgreSQL that shows the details.

如果您还在考虑没有MySQL的方法,那么您可能需要查看PostgreSQL,这是另一个免费的开源数据库。 PostgreSQL以递归公用表表达式的形式支持递归查询,这使得查询heirarchical数据比在MySQL中更容易,并且还提供更好的性能。 Quassnoi还撰写了一篇文章Adjacency list vs.嵌套集:PostgreSQL,它显示了详细信息。

While we are talking about looking at other approaches, Oracle's database is also worth a mention. Oracle also have a custom extension CONNECT BY which make querying heirarchical data very easy and fast. Quassnoi's article Adjacency list vs. nested sets: Oracle again covers the performance details. The query you need to get all children is extremely simple in this case:

虽然我们正在讨论其他方法,但Oracle的数据库也值得一提。 Oracle还有一个自定义扩展CONNECT BY,可以非常简单快速地查询分层数据。 Quassnoi的文章邻接列表与嵌套集:Oracle再次涵盖了性能细节。在这种情况下,获取所有孩子所需的查询非常简单:

SELECT *
FROM yourtable
START WITH id = 42
CONNECT BY parent = PRIOR id

#2


2  

I would always go with the Nested Set for shear simplicity and convienience. I always suggest this article. It shows excelent the queries that are needed for the work with such hierachrchical data. The only disadvantage I see here is that it can get slower with inserting/updateing new records when the hierachry reached a certain level of complexity, but the reading is faster than many other solutions I hae seen.

我会一直使用嵌套装置来实现剪切简单和方便。我总是建议这篇文章。它显示了使用这种层次数据工作所需的查询。我在这里看到的唯一缺点是,当层次结构达到一定程度的复杂性时,插入/更新新记录会变慢,但读数比我见过的许多其他解决方案更快。

Just to give you an example from the article above:

只是给你一个上面文章的例子:

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';

+-------------+----------------------+--------------+-------+
| lev1        | lev2                 | lev3         | lev4  |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS          | TUBE         | NULL  |
| ELECTRONICS | TELEVISIONS          | LCD          | NULL  |
| ELECTRONICS | TELEVISIONS          | PLASMA       | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS  | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS   | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL  |
+-------------+----------------------+--------------+-------+
6 rows in set (0.00 sec)

SQL wise, I don't think it can get any prettier and simpler ;)

SQL明智的,我不认为它可以变得更漂亮和更简单;)

I have no idea to the Stored Procedure way. But since it involces recursion (in your case), I don't know if it will be fast with many levels in the hierarchy. I assume you can give it a try.

我不知道存储过程的方式。但由于它涉及递归(在你的情况下),我不知道它是否会在层次结构中有很多级别。我想你可以尝试一下。

#3


1  

Maybe you should consider using document-oriented database like MongoDB. It could make your life a lot easier.

也许您应该考虑使用像MongoDB这样的面向文档的数据库。它可以让你的生活更轻松。

#4


1  

When dealing with hierarchical data sets I find it best to approach it with caching in mind. One of the main benefits to this way of dealing with this issue this way is it doesn't require de-normalizing you database into something that might be harder to mutate.

在处理分层数据集时,我发现最好在考虑缓存的情况下处理它。这种以这种方式处理这个问题的方法的主要好处之一是它不需要将数据库去规范化为可能更难变异的东西。

Since memory heaps' (memcache,redis,etc) lookups are much faster than SQL for simple id -> data resolutions, I would use them to cache a list of the ids of direct children for each node. This way you can get decent performance via a recursive algorithm to build a complete list for any node.

由于内存堆(memcache,redis等)查找比简单的id - >数据分辨率快得多,我会使用它们来缓存每个节点的直接子节点的id列表。这样,您可以通过递归算法获得不错的性能,从而为任何节点构建完整的列表。

To add/delete a new node, you will only need to invalidate its' direct parent cache O(1).

要添加/删除新节点,您只需要使其“直接父缓存O(1)无效”。

If that's not fast enough, you can add another layer of cache to a list of all child of a node at each node. In order for this to work with a decently mutable dataset, you should record the cache performance (ratio of fresh/cached hits) of each node and set a tolerance level for when to store the cache. This also can be stored in a memory heap since it's non-vital data.

如果这还不够快,您可以在每个节点的节点的所有子节点的列表中添加另一层缓存。为了使其能够使用体积可变的数据集,您应该记录每个节点的缓存性能(新鲜/缓存命中率),并设置何时存储缓存的容差级别。这也可以存储在内存堆中,因为它是非重要数据。

If you use this more advanced caching model, you will need to note these complete children node lists will need to be invalidated when any of it's children are changed O(log n).

如果使用此更高级的缓存模型,则需要注意这些完整的子节点列表在其任何子节点更改为O(log n)时需要失效。

Once you have your list of children id's you can use SQL's WHERE id IN( id1, id2, .... ) syntax to query for what you want.

获得子id列表后,可以使用SQL的WHERE id IN(id1,id2,....)语法来查询所需内容。

#5


-1  

I once had to store a complex hierarchical arbitrary-depth bill-of-material system in a SQL-like database manager that wasn't really up to the task, and it ended up forcing messy and tricky indicies, data definitions, queries, etc. After restarting from scratch, using the db manager to provide only an API for record reads and writes on simple indexed keys, and doing all of the actual input/manipulation/reporting in external code, the final result was quicker to implement, easier to understand, and simpler to maintain and enhance. The most complex query needed was essentially SELECT A FROM B.

我曾经不得不在一个类似于SQL的数据库管理器中存储一个复杂的分层任意深度的物料清单系统,这个系统并不真正完成任务,最终导致混乱和棘手的指示,数据定义,查询等从头开始重新启动后,使用数据库管理器仅提供用于在简单索引键上进行记录读取和写入的API,并在外部代码中执行所有实际输入/操作/报告,最终结果更快实现,更容易理解,并且更易于维护和增强。所需的最复杂查询基本上是SELECT A FROM B.

So, instead of embedding logic and operations inside the restrictions of MySQL, consider banging out code to do what you want, and relying on MySQL only for the lowest-level gets/puts.

因此,不要在MySQL的限制内嵌入逻辑和操作,而是考虑敲打代码来做你想做的事情,并且只依赖于MySQL来获得最低级别的获取/放置。

#1


4  

Quassnoi has run some performance tests on the nested sets model and the adjacency list model and documented the results and recommendations in his blog post Adjacency list vs. nested sets: MySQL. The executive summary is:

Quassnoi已经对嵌套集模型和邻接列表模型进行了一些性能测试,并在他的博客文章“邻接列表与嵌套集:MySQL”中记录了结果和建议。执行摘要是:

  • Nested sets is faster for fetching all child nodes or all parent nodes.
  • 对于获取所有子节点或所有父节点,嵌套集更快。
  • Nested sets is a bad idea if you frequently need to update the table.
  • 如果您经常需要更新表,嵌套集是个坏主意。

Here is the conclusion from his article:

以下是他的文章的结论:

In MySQL, the nested sets model should be preferred if the updates to the hierarhical structure are infrequent and it is affordable to lock the table for the duration of an update (which can take minutes on a long table).

在MySQL中,如果对hierarhical结构的更新很少,并且在更新期间锁定表(在长表上可能需要几分钟),则可以使用嵌套集模型。

This implies creating the table using MyISAM storage engine, creating the bounding box of a GEOMETRY type as described above, indexing it with a SPATIAL index and persisting the level in the table.

这意味着使用MyISAM存储引擎创建表,创建如上所述的GEOMETRY类型的边界框,使用SPATIAL索引对其进行索引并在表中保持级别。

If the updates to the table are frequent or it is inaffordable to lock the table for a long period of time implied by an update, then the adjacency list model should be used to store the hierarchical data.

如果对表的更新频繁或者在更新所暗示的长时间内锁定表是不合理的,那么应该使用邻接列表模型来存储分层数据。

This requires creating a function to query the table.

这需要创建一个查询表的函数。

The rest of the article shows how to define the table, implement the queries and gives performance measurements. The use of the spatial index is a clever idea to improve the performance of the nested set model that might be new to you.

本文的其余部分将介绍如何定义表,实现查询以及提供性能测量。使用空间索引是一个聪明的想法,可以提高可能对您来说不熟悉的嵌套集模型的性能。


If you're also considering approaches without MySQL then you might want to look at PostgreSQL which is another free and open-source database. PostgreSQL supports recursive queries in the form of recursive common table expressions which make querying heirarchical data easier than in MySQL and also give better performance. Quassnoi has also written an article Adjacency list vs. nested sets: PostgreSQL that shows the details.

如果您还在考虑没有MySQL的方法,那么您可能需要查看PostgreSQL,这是另一个免费的开源数据库。 PostgreSQL以递归公用表表达式的形式支持递归查询,这使得查询heirarchical数据比在MySQL中更容易,并且还提供更好的性能。 Quassnoi还撰写了一篇文章Adjacency list vs.嵌套集:PostgreSQL,它显示了详细信息。

While we are talking about looking at other approaches, Oracle's database is also worth a mention. Oracle also have a custom extension CONNECT BY which make querying heirarchical data very easy and fast. Quassnoi's article Adjacency list vs. nested sets: Oracle again covers the performance details. The query you need to get all children is extremely simple in this case:

虽然我们正在讨论其他方法,但Oracle的数据库也值得一提。 Oracle还有一个自定义扩展CONNECT BY,可以非常简单快速地查询分层数据。 Quassnoi的文章邻接列表与嵌套集:Oracle再次涵盖了性能细节。在这种情况下,获取所有孩子所需的查询非常简单:

SELECT *
FROM yourtable
START WITH id = 42
CONNECT BY parent = PRIOR id

#2


2  

I would always go with the Nested Set for shear simplicity and convienience. I always suggest this article. It shows excelent the queries that are needed for the work with such hierachrchical data. The only disadvantage I see here is that it can get slower with inserting/updateing new records when the hierachry reached a certain level of complexity, but the reading is faster than many other solutions I hae seen.

我会一直使用嵌套装置来实现剪切简单和方便。我总是建议这篇文章。它显示了使用这种层次数据工作所需的查询。我在这里看到的唯一缺点是,当层次结构达到一定程度的复杂性时,插入/更新新记录会变慢,但读数比我见过的许多其他解决方案更快。

Just to give you an example from the article above:

只是给你一个上面文章的例子:

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';

+-------------+----------------------+--------------+-------+
| lev1        | lev2                 | lev3         | lev4  |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS          | TUBE         | NULL  |
| ELECTRONICS | TELEVISIONS          | LCD          | NULL  |
| ELECTRONICS | TELEVISIONS          | PLASMA       | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS  | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS   | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL  |
+-------------+----------------------+--------------+-------+
6 rows in set (0.00 sec)

SQL wise, I don't think it can get any prettier and simpler ;)

SQL明智的,我不认为它可以变得更漂亮和更简单;)

I have no idea to the Stored Procedure way. But since it involces recursion (in your case), I don't know if it will be fast with many levels in the hierarchy. I assume you can give it a try.

我不知道存储过程的方式。但由于它涉及递归(在你的情况下),我不知道它是否会在层次结构中有很多级别。我想你可以尝试一下。

#3


1  

Maybe you should consider using document-oriented database like MongoDB. It could make your life a lot easier.

也许您应该考虑使用像MongoDB这样的面向文档的数据库。它可以让你的生活更轻松。

#4


1  

When dealing with hierarchical data sets I find it best to approach it with caching in mind. One of the main benefits to this way of dealing with this issue this way is it doesn't require de-normalizing you database into something that might be harder to mutate.

在处理分层数据集时,我发现最好在考虑缓存的情况下处理它。这种以这种方式处理这个问题的方法的主要好处之一是它不需要将数据库去规范化为可能更难变异的东西。

Since memory heaps' (memcache,redis,etc) lookups are much faster than SQL for simple id -> data resolutions, I would use them to cache a list of the ids of direct children for each node. This way you can get decent performance via a recursive algorithm to build a complete list for any node.

由于内存堆(memcache,redis等)查找比简单的id - >数据分辨率快得多,我会使用它们来缓存每个节点的直接子节点的id列表。这样,您可以通过递归算法获得不错的性能,从而为任何节点构建完整的列表。

To add/delete a new node, you will only need to invalidate its' direct parent cache O(1).

要添加/删除新节点,您只需要使其“直接父缓存O(1)无效”。

If that's not fast enough, you can add another layer of cache to a list of all child of a node at each node. In order for this to work with a decently mutable dataset, you should record the cache performance (ratio of fresh/cached hits) of each node and set a tolerance level for when to store the cache. This also can be stored in a memory heap since it's non-vital data.

如果这还不够快,您可以在每个节点的节点的所有子节点的列表中添加另一层缓存。为了使其能够使用体积可变的数据集,您应该记录每个节点的缓存性能(新鲜/缓存命中率),并设置何时存储缓存的容差级别。这也可以存储在内存堆中,因为它是非重要数据。

If you use this more advanced caching model, you will need to note these complete children node lists will need to be invalidated when any of it's children are changed O(log n).

如果使用此更高级的缓存模型,则需要注意这些完整的子节点列表在其任何子节点更改为O(log n)时需要失效。

Once you have your list of children id's you can use SQL's WHERE id IN( id1, id2, .... ) syntax to query for what you want.

获得子id列表后,可以使用SQL的WHERE id IN(id1,id2,....)语法来查询所需内容。

#5


-1  

I once had to store a complex hierarchical arbitrary-depth bill-of-material system in a SQL-like database manager that wasn't really up to the task, and it ended up forcing messy and tricky indicies, data definitions, queries, etc. After restarting from scratch, using the db manager to provide only an API for record reads and writes on simple indexed keys, and doing all of the actual input/manipulation/reporting in external code, the final result was quicker to implement, easier to understand, and simpler to maintain and enhance. The most complex query needed was essentially SELECT A FROM B.

我曾经不得不在一个类似于SQL的数据库管理器中存储一个复杂的分层任意深度的物料清单系统,这个系统并不真正完成任务,最终导致混乱和棘手的指示,数据定义,查询等从头开始重新启动后,使用数据库管理器仅提供用于在简单索引键上进行记录读取和写入的API,并在外部代码中执行所有实际输入/操作/报告,最终结果更快实现,更容易理解,并且更易于维护和增强。所需的最复杂查询基本上是SELECT A FROM B.

So, instead of embedding logic and operations inside the restrictions of MySQL, consider banging out code to do what you want, and relying on MySQL only for the lowest-level gets/puts.

因此,不要在MySQL的限制内嵌入逻辑和操作,而是考虑敲打代码来做你想做的事情,并且只依赖于MySQL来获得最低级别的获取/放置。