表设计——宽表与列作为属性

时间:2021-08-29 05:57:49

I'm part of a team architecting an Operational Data Store (ODS) database, using SQL Server 2012, that will be used by some of our analysts to do predictive modeling. The ODS will contain manufacturing production data for a single product we make.

我是使用SQL Server 2012构建操作数据存储(ODS)数据库的团队的一员,我们的一些分析师将使用该数据库进行预测建模。ODS将包含我们生产的单个产品的生产数据。

We will have hundreds of tables in the ODS. However, we will have a single core table that will contain critical information (lifecycle info) about each item manufactured (tens of millions each year). Our product is manufactured in a manufacturing plant and spends roughly 2.5 hours moving through various processes along a production line. We want to store various, individual, pieces of manufacturing and post manufacturing information in this core table. An example piece of data might be the time the product entered a particular oven.

我们将在ODS中有数百个表格。但是,我们将有一个核心表,其中将包含关于每个产品的关键信息(生命周期信息)(每年数千万)。我们的产品是在一家制造厂生产的,在生产线上花了大约2.5个小时的时间在不同的工序之间穿梭。我们想要在这个核心表格中存储各种不同的,个人的,制造和后制造的信息。一个示例数据可能是产品进入特定烤箱的时间。

We have a decision to make on how to architect this table. We can create a wide table (many columns) or a narrow table where most columns are rows (as property values). I have never designed and worked with a table structure that is very narrow and columns are treated as rows in the table.

我们决定如何架构这张桌子。我们可以创建一个宽表(许多列)或一个窄表,其中大多数列是行(作为属性值)。我从未设计和使用过非常窄的表结构,列被视为表中的行。

I'd like some feedback on the pros and cons of a wide table vs. a narrow table. The following might be useful in helping with this discussion:

我想要一些关于宽表和窄表的优缺点的反馈。以下内容可能有助于我们进行讨论:

Number of products produced each year: Several million (each of these product instances will be a row in the core table)

每年生产的产品数量:几百万(每个产品实例都是核心表中的一行)

Will this table be queried often: Yes, very often. It will be the parent to many child tables.

这张桌子会经常被问到吗?是的,经常被问到。它将是许多子表的父表。

Potential number of columns (or row properties): 75 to 150+

列(或行属性)的潜在数量:75到150+

If more information would be useful, I'd be glad to provide it.

如果能提供更多的信息,我很乐意提供。

2 个解决方案

#1


5  

Wide tables, static properties

宽表,静态属性

You are tracking a single product through a well-defined manufacturing process. This data model sounds very static, and would lend itself to a wide table with many columns that are consistently populated with data.

您正在通过定义良好的制造过程跟踪单个产品。这个数据模型听起来非常静态,并且它将自己提供给一个具有许多列的宽表,这些列始终填充着数据。

Narrow tables, dynamic properties

狭窄的表、动态属性

If you had many, many products with lots of variation in the manufacturing process, it would be better suited for a narrow table, where you could easily add new properties for tracking.

如果你有很多很多的产品,在制造过程中有很多的变化,它将更适合于一个狭窄的桌子,你可以很容易地添加新的属性来跟踪。

Difficult to query a narrow table

很难查询狭窄的表

However, even simple querying of a narrow table can extremely difficult. For example, what if you needed to sort the data by a certain property when that property is shuffled amongst 100+ other property rows? How would you get all the rows together to form a single "record" and then sort the record groups within your result set?

然而,即使是对一个狭窄的表进行简单的查询也非常困难。例如,如果您需要按某个属性对数据进行排序,而该属性在100多个其他属性行之间来回移动,该怎么办?如何将所有的行合并成一个“记录”,然后在结果集中对记录组进行排序?

Flat tables simpler to query

更容易查询的扁平表

Depending on how you need to view and analyze the data, you may find yourself constantly using pivot or crosstab queries. If that's the case, then why not flatten out the storage table to begin with?

根据需要查看和分析数据的方式,您可能会发现自己经常使用pivot或交叉表查询。如果是这样的话,那么为什么不先把存储表平铺一下呢?

Or do both

还是两个

Another option is to do both: Store the data narrowly, and use a transformation process to flatten it out for ease of reporting. That way you can quickly begin tracking new properties (just by adding rows), and then you can work on getting your reporting tables and transformation process updated to utilize the new data.

另一种选择是两者都做:将数据存储得更窄,并使用转换过程将其变平,以便于报告。通过这种方式,您可以快速地开始跟踪新的属性(只需添加行),然后您可以着手更新报表和转换过程,以利用新的数据。

#2


0  

How wide is too wide? Well, there can be several problems with wide tables.

多宽才算太宽?嗯,宽表可能会有几个问题。

One problem is that wide tables tend to deviate from the rules for normalizing data. This in turn can result in tricky update problems where you have to be careful to prevent the database from entering a self contradictory state. There's no particular answer to how wide it too wide here. Just apply the normalization rules, and you'll end up decomposing the table.

一个问题是,宽表往往偏离规范数据的规则。这反过来会导致复杂的更新问题,您必须小心防止数据库进入自相矛盾的状态。没有特定的答案来解释它有多宽。只需应用规范化规则,最终将分解表。

However, some databases are not built with normalization as the guiding principle. In particular, consider fact tables in star schemas. There are times when some of the coulmns are determined by some subset of the FK's, and this can violate 3NF or even 2NF. Keeping fact tables skinny is still important in star schemas, but it's for a different reason, namely speed. Sometimes, a fact table can be made skinnier by pushing data out to one of the dimension tables. Sometimes, you can decompose a star into two or more related stars.

然而,有些数据库并没有以规范化为指导原则。特别是考虑星型模式中的事实表。有些时候,一些库仑是由FK的某个子集决定的,这可以违背3NF甚至2NF。保持事实表的简洁性在星型模式中仍然很重要,但原因不同,即速度。有时,事实表可以通过将数据推出维度表中的一个而变得更瘦。有时,你可以把一颗恒星分解成两颗或更多相关的恒星。

Your case sounds like the second reason given above, even though your design probably isn't a star schema. Still, star schema design principles might help you improve your design.

您的案例听起来像是上面给出的第二个原因,即使您的设计可能不是星型模式。不过,星型模式设计原则可能会帮助您改进设计。

#1


5  

Wide tables, static properties

宽表,静态属性

You are tracking a single product through a well-defined manufacturing process. This data model sounds very static, and would lend itself to a wide table with many columns that are consistently populated with data.

您正在通过定义良好的制造过程跟踪单个产品。这个数据模型听起来非常静态,并且它将自己提供给一个具有许多列的宽表,这些列始终填充着数据。

Narrow tables, dynamic properties

狭窄的表、动态属性

If you had many, many products with lots of variation in the manufacturing process, it would be better suited for a narrow table, where you could easily add new properties for tracking.

如果你有很多很多的产品,在制造过程中有很多的变化,它将更适合于一个狭窄的桌子,你可以很容易地添加新的属性来跟踪。

Difficult to query a narrow table

很难查询狭窄的表

However, even simple querying of a narrow table can extremely difficult. For example, what if you needed to sort the data by a certain property when that property is shuffled amongst 100+ other property rows? How would you get all the rows together to form a single "record" and then sort the record groups within your result set?

然而,即使是对一个狭窄的表进行简单的查询也非常困难。例如,如果您需要按某个属性对数据进行排序,而该属性在100多个其他属性行之间来回移动,该怎么办?如何将所有的行合并成一个“记录”,然后在结果集中对记录组进行排序?

Flat tables simpler to query

更容易查询的扁平表

Depending on how you need to view and analyze the data, you may find yourself constantly using pivot or crosstab queries. If that's the case, then why not flatten out the storage table to begin with?

根据需要查看和分析数据的方式,您可能会发现自己经常使用pivot或交叉表查询。如果是这样的话,那么为什么不先把存储表平铺一下呢?

Or do both

还是两个

Another option is to do both: Store the data narrowly, and use a transformation process to flatten it out for ease of reporting. That way you can quickly begin tracking new properties (just by adding rows), and then you can work on getting your reporting tables and transformation process updated to utilize the new data.

另一种选择是两者都做:将数据存储得更窄,并使用转换过程将其变平,以便于报告。通过这种方式,您可以快速地开始跟踪新的属性(只需添加行),然后您可以着手更新报表和转换过程,以利用新的数据。

#2


0  

How wide is too wide? Well, there can be several problems with wide tables.

多宽才算太宽?嗯,宽表可能会有几个问题。

One problem is that wide tables tend to deviate from the rules for normalizing data. This in turn can result in tricky update problems where you have to be careful to prevent the database from entering a self contradictory state. There's no particular answer to how wide it too wide here. Just apply the normalization rules, and you'll end up decomposing the table.

一个问题是,宽表往往偏离规范数据的规则。这反过来会导致复杂的更新问题,您必须小心防止数据库进入自相矛盾的状态。没有特定的答案来解释它有多宽。只需应用规范化规则,最终将分解表。

However, some databases are not built with normalization as the guiding principle. In particular, consider fact tables in star schemas. There are times when some of the coulmns are determined by some subset of the FK's, and this can violate 3NF or even 2NF. Keeping fact tables skinny is still important in star schemas, but it's for a different reason, namely speed. Sometimes, a fact table can be made skinnier by pushing data out to one of the dimension tables. Sometimes, you can decompose a star into two or more related stars.

然而,有些数据库并没有以规范化为指导原则。特别是考虑星型模式中的事实表。有些时候,一些库仑是由FK的某个子集决定的,这可以违背3NF甚至2NF。保持事实表的简洁性在星型模式中仍然很重要,但原因不同,即速度。有时,事实表可以通过将数据推出维度表中的一个而变得更瘦。有时,你可以把一颗恒星分解成两颗或更多相关的恒星。

Your case sounds like the second reason given above, even though your design probably isn't a star schema. Still, star schema design principles might help you improve your design.

您的案例听起来像是上面给出的第二个原因,即使您的设计可能不是星型模式。不过,星型模式设计原则可能会帮助您改进设计。