是否存在将冗余数据存储在数据库中以提高速度的情况?

时间:2022-10-04 09:42:04

In the following situation:

在以下情况中:

user has many bags
bag has many items

users
-id

bags
-id
-user_id

items
-id
-bag_id

There are 2 ways to get access to a user's items.

有两种方法可以访问用户的项目。

1) An instance method can be added to a user, which iterates over each of the user's bags, and collects the bag's items into an array to return. In Ruby on Rails, something like:

1)可以将实例方法添加到用户,该用户迭代每个用户的行李,并将行李的项目收集到数组中以返回。在Ruby on Rails中,类似于:

#in user.rb
def items
    items = []
    bags.includes(:items).each { |bag| items += bag.items }
end

2) Add a user_id attribute directly to the items table, and add an additional relationship, so that user has many items. Then just do user.items.

2)将user_id属性直接添加到items表中,并添加其他关系,以便用户拥有多个项目。然后只做user.items。

The 2nd method would be faster, but involves storing redundant data. Are there situations where it makes sense to implement it?

第二种方法会更快,但涉及存储冗余数据。是否存在实施它的情况?

4 个解决方案

#1


4  

Yes, there are certain circumstances where it does make sense to introduce some controlled redundancy for the sake of performance. Generally, this should only be done when a database is unable to meet its performance requirements. This is called "denormalisation" and what you have to consider is that it:

是的,在某些情况下,为了性能而引入一些受控冗余是有意义的。通常,只有在数据库无法满足其性能要求时才能执行此操作。这被称为“非规范化”,您必须考虑的是:

  • Can make implementation more complex
  • 可以使实现更复杂

  • Often sacrifices flexibility 
  • 经常牺牲灵活性

  • May speed up retrievals but slow down updates (as you now have to update more than one location)
  • 可能会加快检索速度,但会降低更新速度(因为您现在必须更新多个位置)

So it's something to consider in cases where performance is unsatisfactory and a relation has a low update rate and high query rate.

因此,在性能不令人满意且关系具有较低的更新率和较高的查询率的情况下,需要考虑这一点。

There are also some denormalised database designs, such as the star schema, which are used in database warehousing.

还有一些非规范化的数据库设计,例如星型模式,用于数据库仓库。

#2


1  

Unless there's something you haven't told us, a table like this

除非有一些你没有告诉我们的东西,否则这样的桌子

create table bagged_items (
  user_id integer not null,
  bag_id integer not null,
  item_id integer not null,
  primary key (user_id, bag_id, item_id)
);

is in at least 5NF. It's all key. There's not a bit of redundant data there.

是至少5NF。这都是关键。那里没有一点冗余数据。

What you've done isn't normalization; normalization is based on identifying certain kinds of dependencies, and reducing their effects by projection. And what you've done isn't denormalization, either; denormalization is an undoing of normalization.

你所做的不是正常化;规范化基于识别某些类型的依赖关系,并通过投影减少它们的影响。而你所做的不是非规范化;非规范化是对规范化的一种解除。

You've simply split a primary key into pieces. I don't pretend to know what principle you followed in order to justify that. It looks a little like "no table may have more than one foreign key" normal form. (But, of course, there's no such thing.)

您只需将主键拆分为多个部分即可。我不会假装知道你遵循的原则是什么。它看起来有点像“没有表可能有多个外键”的正常形式。 (但是,当然,没有这样的事情。)

#3


1  

Yes. In particular, reporting databases, data marts and data warehouses often use design principles that knowingly depart from some of the normalization rules. The result is a database that has some redundancy in it, but is not only faster to query, but easier as well.

是。特别是,报告数据库,数据集市和数据仓库通常使用故意偏离某些规范化规则的设计原则。结果是数据库中有一些冗余,但不仅查询速度更快,而且更容易。

Ease of query is particularly important when there is an analytic GUI between the database and the database user. These analytic tools are quite a bit easier to master if certain design principles are followed in database design. Normalization isn't particularly helpful in this regard.

当数据库和数据库用户之间存在分析GUI时,易于查询尤为重要。如果在数据库设计中遵循某些设计原则,则可以更容易地掌握这些分析工具。在这方面,规范化并不是特别有用。

Unnormalized design need not mean undisciplined design. In particular, it's worth boning up on star schema and snowflake schema designs if you plan on building a reporting database, a data mart, or a data warehouse. The process by which a star or snowflake schema is kept up to date, sometimes called ETL (extract-transform-load), has to be carefully written so as to prevent controlled redundancy from resulting in self contradictory data.

非标准化设计不一定意味着无纪律的设计。特别是,如果您计划构建报表数据库,数据集市或数据仓库,则值得对星型模式和雪花模式设计进行剔除。必须仔细编写星形或雪花模式保持最新的过程,有时称为ETL(提取 - 变换 - 加载),以防止受控冗余导致自相矛盾的数据。

In transaction oriented databases, normalized is generally better, although many experts don't try to push it beyond Boyce-Codd normal form.

在面向事务的数据库中,规范化通常更好,尽管许多专家并不试图超越Boyce-Codd的正常形式。

#4


0  

For combining records from two SQL tables, databases implement efficient JOIN methods that can be used from Ruby on Rails. For almost all applications this is fast enough. That being said, for certain high performances stores, you might want to store redundant data as you suggested, but this comes at the cost of having to keep the data in sync on writes.

为了组合来自两个SQL表的记录,数据库实现了可以从Ruby on Rails使用的高效JOIN方法。几乎所有应用程序都足够快。话虽如此,对于某些高性能商店,您可能希望按照建议存储冗余数据,但这需要在写入时保持数据同步。

#1


4  

Yes, there are certain circumstances where it does make sense to introduce some controlled redundancy for the sake of performance. Generally, this should only be done when a database is unable to meet its performance requirements. This is called "denormalisation" and what you have to consider is that it:

是的,在某些情况下,为了性能而引入一些受控冗余是有意义的。通常,只有在数据库无法满足其性能要求时才能执行此操作。这被称为“非规范化”,您必须考虑的是:

  • Can make implementation more complex
  • 可以使实现更复杂

  • Often sacrifices flexibility 
  • 经常牺牲灵活性

  • May speed up retrievals but slow down updates (as you now have to update more than one location)
  • 可能会加快检索速度,但会降低更新速度(因为您现在必须更新多个位置)

So it's something to consider in cases where performance is unsatisfactory and a relation has a low update rate and high query rate.

因此,在性能不令人满意且关系具有较低的更新率和较高的查询率的情况下,需要考虑这一点。

There are also some denormalised database designs, such as the star schema, which are used in database warehousing.

还有一些非规范化的数据库设计,例如星型模式,用于数据库仓库。

#2


1  

Unless there's something you haven't told us, a table like this

除非有一些你没有告诉我们的东西,否则这样的桌子

create table bagged_items (
  user_id integer not null,
  bag_id integer not null,
  item_id integer not null,
  primary key (user_id, bag_id, item_id)
);

is in at least 5NF. It's all key. There's not a bit of redundant data there.

是至少5NF。这都是关键。那里没有一点冗余数据。

What you've done isn't normalization; normalization is based on identifying certain kinds of dependencies, and reducing their effects by projection. And what you've done isn't denormalization, either; denormalization is an undoing of normalization.

你所做的不是正常化;规范化基于识别某些类型的依赖关系,并通过投影减少它们的影响。而你所做的不是非规范化;非规范化是对规范化的一种解除。

You've simply split a primary key into pieces. I don't pretend to know what principle you followed in order to justify that. It looks a little like "no table may have more than one foreign key" normal form. (But, of course, there's no such thing.)

您只需将主键拆分为多个部分即可。我不会假装知道你遵循的原则是什么。它看起来有点像“没有表可能有多个外键”的正常形式。 (但是,当然,没有这样的事情。)

#3


1  

Yes. In particular, reporting databases, data marts and data warehouses often use design principles that knowingly depart from some of the normalization rules. The result is a database that has some redundancy in it, but is not only faster to query, but easier as well.

是。特别是,报告数据库,数据集市和数据仓库通常使用故意偏离某些规范化规则的设计原则。结果是数据库中有一些冗余,但不仅查询速度更快,而且更容易。

Ease of query is particularly important when there is an analytic GUI between the database and the database user. These analytic tools are quite a bit easier to master if certain design principles are followed in database design. Normalization isn't particularly helpful in this regard.

当数据库和数据库用户之间存在分析GUI时,易于查询尤为重要。如果在数据库设计中遵循某些设计原则,则可以更容易地掌握这些分析工具。在这方面,规范化并不是特别有用。

Unnormalized design need not mean undisciplined design. In particular, it's worth boning up on star schema and snowflake schema designs if you plan on building a reporting database, a data mart, or a data warehouse. The process by which a star or snowflake schema is kept up to date, sometimes called ETL (extract-transform-load), has to be carefully written so as to prevent controlled redundancy from resulting in self contradictory data.

非标准化设计不一定意味着无纪律的设计。特别是,如果您计划构建报表数据库,数据集市或数据仓库,则值得对星型模式和雪花模式设计进行剔除。必须仔细编写星形或雪花模式保持最新的过程,有时称为ETL(提取 - 变换 - 加载),以防止受控冗余导致自相矛盾的数据。

In transaction oriented databases, normalized is generally better, although many experts don't try to push it beyond Boyce-Codd normal form.

在面向事务的数据库中,规范化通常更好,尽管许多专家并不试图超越Boyce-Codd的正常形式。

#4


0  

For combining records from two SQL tables, databases implement efficient JOIN methods that can be used from Ruby on Rails. For almost all applications this is fast enough. That being said, for certain high performances stores, you might want to store redundant data as you suggested, but this comes at the cost of having to keep the data in sync on writes.

为了组合来自两个SQL表的记录,数据库实现了可以从Ruby on Rails使用的高效JOIN方法。几乎所有应用程序都足够快。话虽如此,对于某些高性能商店,您可能希望按照建议存储冗余数据,但这需要在写入时保持数据同步。