如果我们放弃关系,关系数据库的规模会不会比NoSQL数据库的规模更大(或更好)?

时间:2022-10-04 10:11:26

Disclaimer: This is a broad question, so it could be moved to a different source (if the admins find it appropriate).

免责声明:这是一个广泛的问题,因此可以将其转移到另一个源(如果管理员认为合适的话)。

All the cool kids seem to be dropping relational databases in favor of their NoSQL counterparts. Everyone will have their reasons, from scaling issues to simply being on the bleeding edge of tech. And, I am not here to question their motives.

所有的酷孩子似乎都在抛弃关系数据库,而代之以NoSQL。每个人都有自己的理由,从尺度上的问题到科技的前沿,我并不是来质疑他们的动机的。

However, what I am interested in is whether any NoSQL transitions ever validated the performance (maintenance) gains over a traditional RDBMS when relationships were dropped. Why would we want to use a RDBMS when the core reason it exists is dropped? A few reasons come to mind

然而,我感兴趣的是,当关系被删除时,任何NoSQL转换是否都验证了性能(维护)收益优于传统RDBMS。当RDBMS存在的核心原因被删除时,我们为什么要使用它呢?我想到了几个原因

  1. 30+ years of academic and work research in developing these systems
  2. 在开发这些系统方面,30多年的学术和工作研究
  3. A well-known language in Structured Query Language (SQL).
  4. 结构化查询语言(SQL)中的一种知名语言。
  5. Stable and mature ORM support across technologies (Hibernate, ActiveRecord)
  6. 跨技术的稳定和成熟的ORM支持(Hibernate, ActiveRecord)

Clearly, in the modern world where horizontal scaling is important, there is a need to make sure that shards are fault tolerant, updated within the time intervals required by the app, etc. However, those needs shouldn't necessarily be the responsibility of a system that stores data (case in point: ZooKeeper).

显然,在水平缩放很重要的现代世界中,需要确保碎片具有容错性,在应用程序所需的时间间隔内进行更新。但是,这些需求不一定是存储数据的系统的责任(例如:ZooKeeper)。

Also, I acknowledge that research should be dedicated to NoSQL and that time spent in this arena will clearly lead to better more internet worthy technologies. However, a comparison of sorts between NoSQL and traditional RDBMS offerings (minus relationships) would be useful in making business decisions.

此外,我承认,研究应该致力于NoSQL,在这个领域中花费的时间显然会带来更好的、更有互联网价值的技术。但是,比较NoSQL和传统RDBMS产品(负关系)之间的排序将有助于做出业务决策。

UPDATE 1: When I refer to NoSQL databases, I am talking about data stores that may not require fixed table schemas and usually avoid join operations. Hence, the emphasis in the question on dropping the relationships in a traditional SQL RDBMS

更新1:当我提到NoSQL数据库时,我指的是不需要固定表模式的数据存储,并且通常避免连接操作。因此,问题的重点是在传统的SQL RDBMS中删除关系

5 个解决方案

#1


13  

I don't find that inter-table relationships are the main limiter for scalability. I use queries with joins regularly and get good scalability if indexes are defined well.

我不认为表间关系是可扩展性的主要限制因素。我经常使用带有连接的查询,如果索引定义良好,我就可以获得良好的可伸缩性。

The greater limiter for scalability is the cost of synchronous I/O. The requirements of consistency and durability -- that the DBMS actually and reliably saves data when it tells you it saved data -- is expensive.

对可伸缩性的更大限制因素是同步I/O的成本。一致性和持久性的要求——当DBMS告诉您它保存了数据时,它实际上可靠地保存了数据——是昂贵的。

Several NoSQL products that are currently in vogue achieve great performance by weakening their consistency and durability guarantees in their default configuration. There are many reports of CouchDB or MongoDB losing data.

目前流行的几种NoSQL产品在其默认配置中弱化了它们的一致性和持久性保证,从而获得了良好的性能。有很多关于CouchDB或MongoDB丢失数据的报告。

There are ways you can configure those NoSQL products to be more strict about durability, but then you sacrifice their impressive performance numbers.

有一些方法可以将这些NoSQL产品配置为更严格的耐久性,但是您需要牺牲它们令人印象深刻的性能数据。

Likewise, you can make an SQL database achieve high performance like the NoSQL products, by disabling the default features that ensure data safety. See RunningWithScissorsDB.

同样,通过禁用确保数据安全的默认特性,您可以使SQL数据库像NoSQL产品一样获得高性能。看到RunningWithScissorsDB。

PS: If you think document-oriented databases are "cutting edge", I invite you to read about MUMPS. Everything old is new again. :-)

PS:如果你认为面向文档的数据库是“前沿”,我请你阅读有关腮腺炎的文章。旧事重提。:-)

#2


3  

There seem to be at least two misconceptions that might be implied by this question. Firstly "NoSQL" does not mean "non-relational", it just means something other than SQL. So a RDBMS could be a NoSQL DBMS too.

这个问题似乎至少暗示了两个误解。首先,“NoSQL”并不是指“非关系型”,而是指SQL之外的东西。所以RDBMS也可以是NoSQL DBMS。

Secondly, an RDBMS has nothing much to do with relationships* per se. Relationships are not part of the relational model and they can exist in non-relational databases as well (including No-SQL ones). The "relational" part of RDBMS refers specifically to relations - i.e. the data structure more commonly called a "table" (and never called a "relationship"). The question seems to be mixing up those two important and very different things: relation and relationship.

其次,RDBMS本身与关系没有多大关系。关系不是关系模型的一部分,它们也可以存在于非关系数据库中(包括非sql数据库)。RDBMS的“关系”部分专门指关系,即更常用的数据结构称为“表”(而从不称为“关系”)。这个问题似乎混淆了这两个重要而又截然不同的东西:关系和关系。

Since the existence of or absence of relationships has nothing to do with whether a database is relational or not, I'm not sure what the question is really asking. If I've misunderstood something then maybe you could clarify the question a bit.

由于关系的存在或缺失与数据库是否关系无关,因此我不确定问题的真正含义。如果我误解了什么,你可以把问题弄清楚一点。

*A relationship is an "association among things" - or sometimes a database constraint that enforces a rule about such associations.

*关系是“事物之间的关联”——有时是数据库约束,对这种关联实施规则。

#3


3  

SQL generally has scaling issues because the guarantees it gives are not only for one "row" at a time. They are spanning across rows. This makes the load hard to distribute. Here are examples of RDBMS's giving guarantees spanning more than one record:

SQL通常存在扩展问题,因为它提供的保证不是一次只针对一个“行”。它们跨越了行。这使得负载难以分布。下面是RDBMS提供跨越多个记录的保证的例子:

  1. Indexes: Atomic update of two underlying tables at once (the index internally is a table)
  2. 索引:同时对两个底层表进行原子更新(索引内部是一个表)
  3. Foreign keys
  4. 外键
  5. Materialized views
  6. 物化视图

The problem with those features is that they don't lend themselves well to partitioning. In all 3 cases, a particular write might span multiple partitions causing scaling issues.

这些特性的问题在于它们不能很好地用于分区。在这三种情况下,一个特定的写入可能会跨越多个分区,从而导致扩展问题。

NoSQL generally "solves" this by just disallowing those features ;-)

NoSQL通常通过不允许这些特性来“解决”这个问题;

The next issue holding back SQL is that it provides ACID semantics by default. This is not inherent in the relational model - it is an implementation detail.

下一个限制SQL的问题是,它默认提供了ACID语义。这不是关系模型固有的—它是实现细节。

So if you turn off those features that are hard to distribute/partition and disable ACID you get NoSQL performance. In fact look at how HandlerSocket does this with MySQL. It has NoSQL speeds although it runs on InnoDB and provides a standard full-featured SQL-Interface (it really is just a featureless bypass on a standard MySQL server).

因此,如果您关闭那些难以分发/分区和禁用ACID的特性,您将获得NoSQL性能。实际上,看看HandlerSocket是如何使用MySQL的。虽然它在InnoDB上运行,并且提供了一个标准的全功能sql接口(它实际上只是一个标准MySQL服务器上的无特性的旁路),但是它没有sql速度。

No magic in NoSQL, just less features. Which is ok. It is a different trade-off.

NoSQL中没有魔法,只有更少的特性。这是好的。这是另一种交换。

#4


0  

I think the pros/cons of using RDBMS or NoSQL really depends on the data and how you plan to use it. It is my understanding that transactions are actually represented quite well with a relational DB. My experience with NoSql is with Infinite Graph & Neo4J. Forensics is a good use case for NoSQL, each person is an node/vertex and an edge can represent different types of communication (email, phone, face to face meeting, carrier pigeon, etc...). You can then take a suspect/vertex and traverse the graph with specific criteria to find how two seemingly unconnected individuals are actually connected (probably with more efficiency than a traditional relational DB). Social graph data is another good example, every user is a node/vertex and the relationship(friend) is an edge connecting two nodes. In short, is your data best represented & retrieved with tables or nodes/edges.

我认为使用RDBMS或NoSQL的利弊实际上取决于数据以及您计划如何使用它。根据我的理解,事务实际上是用关系DB很好地表示的。我使用NoSql的经验是使用Infinite Graph & Neo4J。取证是一个很好的NoSQL用例,每个人都是一个节点/顶点,一个边缘可以代表不同类型的通信(电子邮件、电话、面对面会议、信鸽等)。然后,您可以取一个可疑的/顶点,并使用特定的标准遍历图,以找到两个看似不相连的个体实际上是如何连接的(可能比传统的关系数据库更有效)。社交图数据是另一个很好的例子,每个用户都是一个节点/顶点,关系(朋友)是连接两个节点的一条边。简而言之,您的数据是否最好用表或节点/边来表示和检索。

#5


0  

Relationships is not a good criteria to compare performance between RDBMS and NoSQL.

关系不是比较RDBMS和NoSQL之间性能的好标准。

NoSQL has become very popular due to many factors

由于许多因素,NoSQL已经变得非常流行

  1. Horizontal scalability.
  2. 水平可伸缩性。
  3. Support for unstructured & semi-structured data
  4. 支持非结构化和半结构化数据。
  5. Read/Write throughput
  6. 读/写吞吐量
  7. Cheap hardware cost etc.
  8. 廉价的硬件成本等。

Have a look at RDBMS Overheads

看看RDBMS管理费用

RDBMS have challenges due to consistency requirements.

由于一致性需求,RDBMS具有挑战性。

To support transactions, RDBMS has to support ACID properties : Atomicity, Consistency, Isolation, Durability). This can be achieved with

为了支持事务,RDBMS必须支持ACID属性:原子性、一致性、隔离性和持久性。这是可以实现的

Logging: Assembling log records and tracking down all changes in database structures slows performance. Logging may not be necessary if recoverability is not a requirement or if recoverability is provided through other means (e.g., other sites on the network).

日志记录:收集日志记录并跟踪数据库结构中的所有更改会降低性能。如果可恢复性不是必需的,或者可恢复性是通过其他方式(例如,网络上的其他站点)提供的,那么日志记录可能是不必要的。

Locking: Traditional two-phase locking poses a sizeable overhead since all accesses to database structures are governed by a separate entity, the Lock Manager.

锁定:传统的两阶段锁定会造成相当大的开销,因为对数据库结构的所有访问都由一个单独的实体锁管理器控制。

Latching: In a multi-threaded database, many data structures have to be latched before they can be accessed. Removing this feature and going to a single-threaded approach has a noticeable performance impact.

锁存:在多线程数据库中,许多数据结构在被访问之前必须被锁存。删除该特性并采用单线程方法会显著影响性能。

Buffer management: A main memory database system does not need to access pages through a buffer pool, eliminating a level of indirection on every record access.

缓冲区管理:主内存数据库系统不需要通过缓冲池访问页面,从而消除了对每个记录访问的间接访问。

In Summary, RDBMS is not scaling due to above overheads, which are necessary to support ACID transactions.Lack of relationships does not improve performance of RDBMS system.

综上所述,RDBMS不受上述开销的影响,这对于支持ACID事务是必要的。缺少关系并不能提高RDBMS系统的性能。

#1


13  

I don't find that inter-table relationships are the main limiter for scalability. I use queries with joins regularly and get good scalability if indexes are defined well.

我不认为表间关系是可扩展性的主要限制因素。我经常使用带有连接的查询,如果索引定义良好,我就可以获得良好的可伸缩性。

The greater limiter for scalability is the cost of synchronous I/O. The requirements of consistency and durability -- that the DBMS actually and reliably saves data when it tells you it saved data -- is expensive.

对可伸缩性的更大限制因素是同步I/O的成本。一致性和持久性的要求——当DBMS告诉您它保存了数据时,它实际上可靠地保存了数据——是昂贵的。

Several NoSQL products that are currently in vogue achieve great performance by weakening their consistency and durability guarantees in their default configuration. There are many reports of CouchDB or MongoDB losing data.

目前流行的几种NoSQL产品在其默认配置中弱化了它们的一致性和持久性保证,从而获得了良好的性能。有很多关于CouchDB或MongoDB丢失数据的报告。

There are ways you can configure those NoSQL products to be more strict about durability, but then you sacrifice their impressive performance numbers.

有一些方法可以将这些NoSQL产品配置为更严格的耐久性,但是您需要牺牲它们令人印象深刻的性能数据。

Likewise, you can make an SQL database achieve high performance like the NoSQL products, by disabling the default features that ensure data safety. See RunningWithScissorsDB.

同样,通过禁用确保数据安全的默认特性,您可以使SQL数据库像NoSQL产品一样获得高性能。看到RunningWithScissorsDB。

PS: If you think document-oriented databases are "cutting edge", I invite you to read about MUMPS. Everything old is new again. :-)

PS:如果你认为面向文档的数据库是“前沿”,我请你阅读有关腮腺炎的文章。旧事重提。:-)

#2


3  

There seem to be at least two misconceptions that might be implied by this question. Firstly "NoSQL" does not mean "non-relational", it just means something other than SQL. So a RDBMS could be a NoSQL DBMS too.

这个问题似乎至少暗示了两个误解。首先,“NoSQL”并不是指“非关系型”,而是指SQL之外的东西。所以RDBMS也可以是NoSQL DBMS。

Secondly, an RDBMS has nothing much to do with relationships* per se. Relationships are not part of the relational model and they can exist in non-relational databases as well (including No-SQL ones). The "relational" part of RDBMS refers specifically to relations - i.e. the data structure more commonly called a "table" (and never called a "relationship"). The question seems to be mixing up those two important and very different things: relation and relationship.

其次,RDBMS本身与关系没有多大关系。关系不是关系模型的一部分,它们也可以存在于非关系数据库中(包括非sql数据库)。RDBMS的“关系”部分专门指关系,即更常用的数据结构称为“表”(而从不称为“关系”)。这个问题似乎混淆了这两个重要而又截然不同的东西:关系和关系。

Since the existence of or absence of relationships has nothing to do with whether a database is relational or not, I'm not sure what the question is really asking. If I've misunderstood something then maybe you could clarify the question a bit.

由于关系的存在或缺失与数据库是否关系无关,因此我不确定问题的真正含义。如果我误解了什么,你可以把问题弄清楚一点。

*A relationship is an "association among things" - or sometimes a database constraint that enforces a rule about such associations.

*关系是“事物之间的关联”——有时是数据库约束,对这种关联实施规则。

#3


3  

SQL generally has scaling issues because the guarantees it gives are not only for one "row" at a time. They are spanning across rows. This makes the load hard to distribute. Here are examples of RDBMS's giving guarantees spanning more than one record:

SQL通常存在扩展问题,因为它提供的保证不是一次只针对一个“行”。它们跨越了行。这使得负载难以分布。下面是RDBMS提供跨越多个记录的保证的例子:

  1. Indexes: Atomic update of two underlying tables at once (the index internally is a table)
  2. 索引:同时对两个底层表进行原子更新(索引内部是一个表)
  3. Foreign keys
  4. 外键
  5. Materialized views
  6. 物化视图

The problem with those features is that they don't lend themselves well to partitioning. In all 3 cases, a particular write might span multiple partitions causing scaling issues.

这些特性的问题在于它们不能很好地用于分区。在这三种情况下,一个特定的写入可能会跨越多个分区,从而导致扩展问题。

NoSQL generally "solves" this by just disallowing those features ;-)

NoSQL通常通过不允许这些特性来“解决”这个问题;

The next issue holding back SQL is that it provides ACID semantics by default. This is not inherent in the relational model - it is an implementation detail.

下一个限制SQL的问题是,它默认提供了ACID语义。这不是关系模型固有的—它是实现细节。

So if you turn off those features that are hard to distribute/partition and disable ACID you get NoSQL performance. In fact look at how HandlerSocket does this with MySQL. It has NoSQL speeds although it runs on InnoDB and provides a standard full-featured SQL-Interface (it really is just a featureless bypass on a standard MySQL server).

因此,如果您关闭那些难以分发/分区和禁用ACID的特性,您将获得NoSQL性能。实际上,看看HandlerSocket是如何使用MySQL的。虽然它在InnoDB上运行,并且提供了一个标准的全功能sql接口(它实际上只是一个标准MySQL服务器上的无特性的旁路),但是它没有sql速度。

No magic in NoSQL, just less features. Which is ok. It is a different trade-off.

NoSQL中没有魔法,只有更少的特性。这是好的。这是另一种交换。

#4


0  

I think the pros/cons of using RDBMS or NoSQL really depends on the data and how you plan to use it. It is my understanding that transactions are actually represented quite well with a relational DB. My experience with NoSql is with Infinite Graph & Neo4J. Forensics is a good use case for NoSQL, each person is an node/vertex and an edge can represent different types of communication (email, phone, face to face meeting, carrier pigeon, etc...). You can then take a suspect/vertex and traverse the graph with specific criteria to find how two seemingly unconnected individuals are actually connected (probably with more efficiency than a traditional relational DB). Social graph data is another good example, every user is a node/vertex and the relationship(friend) is an edge connecting two nodes. In short, is your data best represented & retrieved with tables or nodes/edges.

我认为使用RDBMS或NoSQL的利弊实际上取决于数据以及您计划如何使用它。根据我的理解,事务实际上是用关系DB很好地表示的。我使用NoSql的经验是使用Infinite Graph & Neo4J。取证是一个很好的NoSQL用例,每个人都是一个节点/顶点,一个边缘可以代表不同类型的通信(电子邮件、电话、面对面会议、信鸽等)。然后,您可以取一个可疑的/顶点,并使用特定的标准遍历图,以找到两个看似不相连的个体实际上是如何连接的(可能比传统的关系数据库更有效)。社交图数据是另一个很好的例子,每个用户都是一个节点/顶点,关系(朋友)是连接两个节点的一条边。简而言之,您的数据是否最好用表或节点/边来表示和检索。

#5


0  

Relationships is not a good criteria to compare performance between RDBMS and NoSQL.

关系不是比较RDBMS和NoSQL之间性能的好标准。

NoSQL has become very popular due to many factors

由于许多因素,NoSQL已经变得非常流行

  1. Horizontal scalability.
  2. 水平可伸缩性。
  3. Support for unstructured & semi-structured data
  4. 支持非结构化和半结构化数据。
  5. Read/Write throughput
  6. 读/写吞吐量
  7. Cheap hardware cost etc.
  8. 廉价的硬件成本等。

Have a look at RDBMS Overheads

看看RDBMS管理费用

RDBMS have challenges due to consistency requirements.

由于一致性需求,RDBMS具有挑战性。

To support transactions, RDBMS has to support ACID properties : Atomicity, Consistency, Isolation, Durability). This can be achieved with

为了支持事务,RDBMS必须支持ACID属性:原子性、一致性、隔离性和持久性。这是可以实现的

Logging: Assembling log records and tracking down all changes in database structures slows performance. Logging may not be necessary if recoverability is not a requirement or if recoverability is provided through other means (e.g., other sites on the network).

日志记录:收集日志记录并跟踪数据库结构中的所有更改会降低性能。如果可恢复性不是必需的,或者可恢复性是通过其他方式(例如,网络上的其他站点)提供的,那么日志记录可能是不必要的。

Locking: Traditional two-phase locking poses a sizeable overhead since all accesses to database structures are governed by a separate entity, the Lock Manager.

锁定:传统的两阶段锁定会造成相当大的开销,因为对数据库结构的所有访问都由一个单独的实体锁管理器控制。

Latching: In a multi-threaded database, many data structures have to be latched before they can be accessed. Removing this feature and going to a single-threaded approach has a noticeable performance impact.

锁存:在多线程数据库中,许多数据结构在被访问之前必须被锁存。删除该特性并采用单线程方法会显著影响性能。

Buffer management: A main memory database system does not need to access pages through a buffer pool, eliminating a level of indirection on every record access.

缓冲区管理:主内存数据库系统不需要通过缓冲池访问页面,从而消除了对每个记录访问的间接访问。

In Summary, RDBMS is not scaling due to above overheads, which are necessary to support ACID transactions.Lack of relationships does not improve performance of RDBMS system.

综上所述,RDBMS不受上述开销的影响,这对于支持ACID事务是必要的。缺少关系并不能提高RDBMS系统的性能。