存储和查询树状分层数据的有效方法

时间:2020-12-18 00:33:45

Please see the image here:

请看这里的图片:

https://picasaweb.google.com/108987384888529766314/CS3217Project#5717590602842112850

https://picasaweb.google.com/108987384888529766314/CS3217Project#5717590602842112850

So, as you can see from the image, we are trying to store hierarchical data into a database. 1 publisher has may articles, 1 article has many comments and so on. Thus, if I use a relational database like SQL Server, I will have a publisher table, then an articles table and a comments table. But the comments table will grow very quickly and become very large.

因此,正如您从图像中看到的那样,我们正在尝试将分层数据存储到数据库中。 1个出版商可能有文章,1个文章有很多评论等。因此,如果我使用像SQL Server这样的关系数据库,我将有一个发布者表,然后是一个文章表和一个注释表。但评论表会迅速增长并变得非常大。

Thus, is there any alternative which allows me to store and query such tree like data efficiently? How about NoSQL (MongoDB)?

因此,是否有任何替代方案可以让我有效地存储和查询这样的树状数据? NoSQL(MongoDB)怎么样?

4 个解决方案

#1


4  

You can use adjacent lists for hierarchical data. It's efficient and easy to implement. It works also with MySQL. Here a link: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/.

您可以使用相邻列表来分层数据。它高效且易于实施。它也适用于MySQL。这里有一个链接:http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/。

#2


2  

Here is good survey of 8 NoSQL distributed databases and the needs that they fill.

以下是对8个NoSQL分布式数据库及其填充需求的良好调查。

Do you anticipate you will write more than you read?
Do you anticipate you will need low-latency data access, high concurrency support and high availability is a requirement?
Do you need dynamic queries?
Do you prefer to define indexes, not map/reduce functions?
Is versioning important?
Do you anticipate you will accumulate occasionally changing data, on which pre-defined queries are to be run?
Do you anticipate you will rapidly changing data with a foreseeable database size (should fit mostly in memory)?
Do you anticipate graph-style, rich or complex, interconnected data?
Do you anticipate you will need random, realtime read/write access to BigTable-like data?

你预计你会写的比你读的更多吗?您是否预计您需要低延迟数据访问,高并发支持和高可用性是必需的?你需要动态查询吗?您更喜欢定义索引,而不是map / reduce函数吗?版本控制很重要吗?您是否预计会偶尔积累更改数据,以及要运行的预定义查询?您是否预期您将以可预见的数据库大小快速更改数据(应该主要适合内存)?您是否预期图形式,丰富或复杂的互连数据?您是否预计您需要对类似BigTable的数据进行随机,实时的读/写访问?

#3


1  

Most NOSQL database design involves a mix of the following techniques:

大多数NOSQL数据库设计涉及以下技术的混合:

  • Embedding - nesting of objects and arrays inside a document
  • 嵌入 - 在文档中嵌套对象和数组
  • Linking - references between documents
  • 链接 - 文档之间的引用

The schema you craft depends on various aspects of you data. One solution to your problem may be the following schema:

您制作的架构取决于您数据的各个方面。您的问题的一个解决方案可能是以下架构:

db.articles { _id: ARTICLE_ID;  publisher: "publisher name";     ...    }
db.comments { _id: COMMENT_ID; article_id: ARTICLE_ID;    ... }

Here the publisher is embedded in an article document. We can do this because it's unlikely the publisher name will change. It also saves us having to look up publisher details every time we need to access an article.

这里的发布者嵌入在文章文档中。我们可以这样做,因为发布商名称不太可能发生变化。它还节省了我们每次需要访问文章时都必须查找发布者详细信息。

The comments are stored in their own documents, with each comment linking to an article. To find all comments associated to an article you can

评论存储在他们自己的文档中,每个评论链接到一篇文章。要查找与文章相关的所有评论,您可以

db.comments.find({article_id:"My Atticle ID"}]

and to speed things up you could always add "article_id" to the index

为了加快速度,你总是可以在索引中添加“article_id”

db.comments.ensureIndex({article_id:1})

#4


1  

I found this SO post when searching the same thing, The URL posted by Phpdevpad is a great read to understand how Adjacency List Model and Nested Set Model work and compare against each other. The article is very much in favor of the Nested Set Model and explains many draw backs to the Adjacency List Model, however I was greatly concerned about the mass updates the nested method would cause.

我在搜索相同的东西时发现了这个帖子,Phpdevpad发布的URL是一个很好的阅读,以了解邻接列表模型和嵌套集模型如何工作和相互比较。本文非常支持嵌套集模型,并解释了对邻接列表模型的许多缺点,但是我非常关注嵌套方法会导致的大量更新。

The main limitation to adjacency lists outlined in the article was that an additional self join was required for each layer of depth. However this limitation is easily overcome with the use of another language (such as php) and a recessive function for finding children such as outlined here: http://www.sitepoint.com/hierarchical-data-database/

文章中概述的邻接列表的主要限制是每个深度层都需要额外的自联接。然而,使用另一种语言(例如php)和用于查找孩子的隐性功能可以轻松克服这种限制,如下所述:http://www.sitepoint.com/hierarchical-data-database/

snippet from url above using the Adjacency List Model

<?php
// $parent is the parent of the children we want to see
// $level is increased when we go deeper into the tree,
//        used to display a nice indented tree 
function display_children($parent, $level) {

  // retrieve all children of $parent
  $result = mysql_query('SELECT title FROM tree WHERE parent="'.$parent.'";');

  // display each child
  while ($row = mysql_fetch_array($result)) {

    // indent and display the title of this child
    echo str_repeat('  ',$level).$row['title']."n";

    // call this function again to display this
    display_children($row['title'], $level+1);
  }
}

// $node is the name of the node we want the path of
function get_path($node) {

  // look up the parent of this node
  $result = mysql_query('SELECT parent FROM tree WHERE title="'.$node.'";');
  $row = mysql_fetch_array($result);

  // save the path in this array
  $path = array();

  // only continue if this $node isn't the root node
  // (that's the node with no parent)
  if ($row['parent']!='') {

    // the last part of the path to $node, is the name
    // of the parent of $node
    $path[] = $row['parent'];

    // we should add the path to the parent of this node
    // to the path
    $path = array_merge(get_path($row['parent']), $path);
  }

  // return the path
  return $path;
}
display_children('',0);

Conclusion

As a result I am now convinced that the Adjacency List Model will be far easier to use and manage moving forward.

因此,我现在确信邻接列表模型将更容易使用和管理向前发展。

#1


4  

You can use adjacent lists for hierarchical data. It's efficient and easy to implement. It works also with MySQL. Here a link: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/.

您可以使用相邻列表来分层数据。它高效且易于实施。它也适用于MySQL。这里有一个链接:http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/。

#2


2  

Here is good survey of 8 NoSQL distributed databases and the needs that they fill.

以下是对8个NoSQL分布式数据库及其填充需求的良好调查。

Do you anticipate you will write more than you read?
Do you anticipate you will need low-latency data access, high concurrency support and high availability is a requirement?
Do you need dynamic queries?
Do you prefer to define indexes, not map/reduce functions?
Is versioning important?
Do you anticipate you will accumulate occasionally changing data, on which pre-defined queries are to be run?
Do you anticipate you will rapidly changing data with a foreseeable database size (should fit mostly in memory)?
Do you anticipate graph-style, rich or complex, interconnected data?
Do you anticipate you will need random, realtime read/write access to BigTable-like data?

你预计你会写的比你读的更多吗?您是否预计您需要低延迟数据访问,高并发支持和高可用性是必需的?你需要动态查询吗?您更喜欢定义索引,而不是map / reduce函数吗?版本控制很重要吗?您是否预计会偶尔积累更改数据,以及要运行的预定义查询?您是否预期您将以可预见的数据库大小快速更改数据(应该主要适合内存)?您是否预期图形式,丰富或复杂的互连数据?您是否预计您需要对类似BigTable的数据进行随机,实时的读/写访问?

#3


1  

Most NOSQL database design involves a mix of the following techniques:

大多数NOSQL数据库设计涉及以下技术的混合:

  • Embedding - nesting of objects and arrays inside a document
  • 嵌入 - 在文档中嵌套对象和数组
  • Linking - references between documents
  • 链接 - 文档之间的引用

The schema you craft depends on various aspects of you data. One solution to your problem may be the following schema:

您制作的架构取决于您数据的各个方面。您的问题的一个解决方案可能是以下架构:

db.articles { _id: ARTICLE_ID;  publisher: "publisher name";     ...    }
db.comments { _id: COMMENT_ID; article_id: ARTICLE_ID;    ... }

Here the publisher is embedded in an article document. We can do this because it's unlikely the publisher name will change. It also saves us having to look up publisher details every time we need to access an article.

这里的发布者嵌入在文章文档中。我们可以这样做,因为发布商名称不太可能发生变化。它还节省了我们每次需要访问文章时都必须查找发布者详细信息。

The comments are stored in their own documents, with each comment linking to an article. To find all comments associated to an article you can

评论存储在他们自己的文档中,每个评论链接到一篇文章。要查找与文章相关的所有评论,您可以

db.comments.find({article_id:"My Atticle ID"}]

and to speed things up you could always add "article_id" to the index

为了加快速度,你总是可以在索引中添加“article_id”

db.comments.ensureIndex({article_id:1})

#4


1  

I found this SO post when searching the same thing, The URL posted by Phpdevpad is a great read to understand how Adjacency List Model and Nested Set Model work and compare against each other. The article is very much in favor of the Nested Set Model and explains many draw backs to the Adjacency List Model, however I was greatly concerned about the mass updates the nested method would cause.

我在搜索相同的东西时发现了这个帖子,Phpdevpad发布的URL是一个很好的阅读,以了解邻接列表模型和嵌套集模型如何工作和相互比较。本文非常支持嵌套集模型,并解释了对邻接列表模型的许多缺点,但是我非常关注嵌套方法会导致的大量更新。

The main limitation to adjacency lists outlined in the article was that an additional self join was required for each layer of depth. However this limitation is easily overcome with the use of another language (such as php) and a recessive function for finding children such as outlined here: http://www.sitepoint.com/hierarchical-data-database/

文章中概述的邻接列表的主要限制是每个深度层都需要额外的自联接。然而,使用另一种语言(例如php)和用于查找孩子的隐性功能可以轻松克服这种限制,如下所述:http://www.sitepoint.com/hierarchical-data-database/

snippet from url above using the Adjacency List Model

<?php
// $parent is the parent of the children we want to see
// $level is increased when we go deeper into the tree,
//        used to display a nice indented tree 
function display_children($parent, $level) {

  // retrieve all children of $parent
  $result = mysql_query('SELECT title FROM tree WHERE parent="'.$parent.'";');

  // display each child
  while ($row = mysql_fetch_array($result)) {

    // indent and display the title of this child
    echo str_repeat('  ',$level).$row['title']."n";

    // call this function again to display this
    display_children($row['title'], $level+1);
  }
}

// $node is the name of the node we want the path of
function get_path($node) {

  // look up the parent of this node
  $result = mysql_query('SELECT parent FROM tree WHERE title="'.$node.'";');
  $row = mysql_fetch_array($result);

  // save the path in this array
  $path = array();

  // only continue if this $node isn't the root node
  // (that's the node with no parent)
  if ($row['parent']!='') {

    // the last part of the path to $node, is the name
    // of the parent of $node
    $path[] = $row['parent'];

    // we should add the path to the parent of this node
    // to the path
    $path = array_merge(get_path($row['parent']), $path);
  }

  // return the path
  return $path;
}
display_children('',0);

Conclusion

As a result I am now convinced that the Adjacency List Model will be far easier to use and manage moving forward.

因此,我现在确信邻接列表模型将更容易使用和管理向前发展。