如果不需要,是否应该使用表关系

时间:2022-11-26 10:43:37

i'm working with an existing client-side legacy database that we're converting to MySQL for online use.

我正在处理一个现有的客户端遗留数据库,我们将其转换为MySQL以供在线使用。

it's effectively one giant table, and no relationships exist.

这实际上是一张巨大的桌子,没有任何关系。

for each record, there are several contact points - first name, last name, title, street, city, state, zip, etc., repeated for several entities. my initial thought was to separate each of these entities into it's own table with the above mentioned columns, and use FKs to link them up with traditional joins, etc.

对于每条记录,都有几个联系人——姓名、姓氏、头衔、街道、城市、州、邮政等等,对几个实体重复。我最初的想法是将这些实体分别到包含上述列的表中,并使用FKs将它们与传统连接联系起来,等等。

but, after going through the entire dataset and talking with the original author, it turns out that none of these contact points will ever repeat (each will be unique to each record), nor is any other information related to these contact points.

但是,在浏览了整个数据集并与原始作者交谈之后,结果发现这些接触点都不会重复(每个记录都是独一无二的),也没有其他与这些接触点相关的信息。

so - AFAICT - there's no real 'use' for relationship tables, except possibly semantics or transparency. the dataset isn't huge but it's not small either (between 50,000 and 100,000 records), so i wonder if if fact it might be more efficient to just keep the single-table structure intact and skip joins altogether.

所以,AFAICT——除了可能的语义或透明性之外,对关系表没有真正的“用途”。数据集并不大,但也不小(在5万到10万条记录之间),所以我想知道,仅仅保持单表结构不变并跳过连接是否更有效。

is there any reason to use separate tables in a situation like this?

在这种情况下,是否有理由使用单独的表?

tyia

tyia

4 个解决方案

#1


2  

Mainframes have used flat file formats for decades quite effectively so I think you can certainly get away with leaving the table as is. That being said, I would consider the following questions:

大型机使用平面文件格式已经有几十年了,所以我认为你当然可以离开桌面。尽管如此,我将考虑以下问题:

  • Are there many queries that use the '*' operator retrieving all columns from the database or are the queries mature enough to not include those columns that are not required. If the former is the case you may want to move them to a separate table for performance reasons
  • 是否有许多查询使用'*'操作符从数据库检索所有列,或者查询是否足够成熟,不包含不需要的列。如果是前者,出于性能原因,您可能希望将它们移动到一个单独的表中
  • Will there ever be a requirement in the future where multiple 'contact' entries for a record WILL be required. You might save a few headaches by doing the conversion now vs. later
  • 将来是否会有一个要求,要求记录有多个“联系人”条目。通过现在和以后进行转换,您可以省去一些麻烦。

I suspect it's just one big flat file and probably will fit as is with no real need to normalize. If you would wind up with a 1 to 1 relationship to the other table and you aren't pulling all columns with every query, flatfile wins.

我怀疑这只是一个大的平面文件,而且很可能适合,因为没有真正的需要正常化。如果最终得到的结果是与另一个表的1比1的关系,并且不是每次查询都拖拽所有列,那么flatfile将会胜出。

#2


2  

Absolutely, even if only for preventing technical debt.

当然,即使只是为了防止技术债务。

Huge tables are always more expensive to maintain -- they have a higher learning curve (so they cost more to train a new developer) and they are not as easy to read "instantly" (meaning that it costs more to even look at the table).

维护庞大的表总是比较昂贵——它们有更高的学习曲线(因此培训新开发人员的成本更高),而且也不容易“立即”阅读(这意味着查看表的成本更高)。

It should be a goal to make code as immediately apparent as possible. A "USER_DATA" table which includes the contact info is about as intuitive as possible. That pattern exists everywhere and everyone has seen it. It requires and invites almost no thought because it is so obvious.

它应该是一个目标,使代码尽可能清晰地显示出来。包含联系信息的“USER_DATA”表尽可能直观。这种模式随处可见,每个人都见过。它要求和邀请几乎没有想法,因为它是如此明显。

The pattern you're describing above makes an experienced developer pause and wonder why it was done that way. That developer might then seek out the original author so that he can understand why it was done that way and not the more intuitive way...

您上面描述的模式会让经验丰富的开发人员停下来思考为什么要这样做。开发人员可能会找到原始的作者,这样他就能理解为什么这样做,而不是更直观的方式……

#3


0  

In this case it may not be much useful, but keeping so many columns in a single table where we have large number of records is not suggested. Better to split the table and store the basic columns in one table like name, password etc, and other descriptive information in another table.

在这种情况下,它可能不是很有用,但是不建议将如此多的列保存在一个表中,其中有大量的记录。最好将表拆分并将基本列存储在一个表中,如名称、密码等,以及另一个表中的其他描述性信息。

#4


0  

There is no benefit to splitting out first name, last name, title, street, city, state, zip. The only good reason so to do would be to add value to each of those fields, for example you could define 'city' in terms of 'state' because they have a relationship but then you would need an ID column to disambiguate 'Springfield, Ill' form 'Springfield, Mass' and queries would get more complicated and performance would be marginally worse. So leaving it all in one table in a 'de-normalised' form in this instance seems like good sense to me.

把名字、姓氏、头衔、街道、城市、州、邮政区分开没有任何好处。唯一的理由,所以会将值添加到每一个领域,例如你可以定义“城市”的“状态”,因为他们有关系但是你需要一个ID列来消除歧义的伊利诺伊州斯普林菲尔德的形式“马萨诸塞州斯普林菲尔德”,查询将变得更加复杂和性能略差。因此,在这个例子中,把所有这些都放在一个“非正常化”的表格中,对我来说是很有意义的。

#1


2  

Mainframes have used flat file formats for decades quite effectively so I think you can certainly get away with leaving the table as is. That being said, I would consider the following questions:

大型机使用平面文件格式已经有几十年了,所以我认为你当然可以离开桌面。尽管如此,我将考虑以下问题:

  • Are there many queries that use the '*' operator retrieving all columns from the database or are the queries mature enough to not include those columns that are not required. If the former is the case you may want to move them to a separate table for performance reasons
  • 是否有许多查询使用'*'操作符从数据库检索所有列,或者查询是否足够成熟,不包含不需要的列。如果是前者,出于性能原因,您可能希望将它们移动到一个单独的表中
  • Will there ever be a requirement in the future where multiple 'contact' entries for a record WILL be required. You might save a few headaches by doing the conversion now vs. later
  • 将来是否会有一个要求,要求记录有多个“联系人”条目。通过现在和以后进行转换,您可以省去一些麻烦。

I suspect it's just one big flat file and probably will fit as is with no real need to normalize. If you would wind up with a 1 to 1 relationship to the other table and you aren't pulling all columns with every query, flatfile wins.

我怀疑这只是一个大的平面文件,而且很可能适合,因为没有真正的需要正常化。如果最终得到的结果是与另一个表的1比1的关系,并且不是每次查询都拖拽所有列,那么flatfile将会胜出。

#2


2  

Absolutely, even if only for preventing technical debt.

当然,即使只是为了防止技术债务。

Huge tables are always more expensive to maintain -- they have a higher learning curve (so they cost more to train a new developer) and they are not as easy to read "instantly" (meaning that it costs more to even look at the table).

维护庞大的表总是比较昂贵——它们有更高的学习曲线(因此培训新开发人员的成本更高),而且也不容易“立即”阅读(这意味着查看表的成本更高)。

It should be a goal to make code as immediately apparent as possible. A "USER_DATA" table which includes the contact info is about as intuitive as possible. That pattern exists everywhere and everyone has seen it. It requires and invites almost no thought because it is so obvious.

它应该是一个目标,使代码尽可能清晰地显示出来。包含联系信息的“USER_DATA”表尽可能直观。这种模式随处可见,每个人都见过。它要求和邀请几乎没有想法,因为它是如此明显。

The pattern you're describing above makes an experienced developer pause and wonder why it was done that way. That developer might then seek out the original author so that he can understand why it was done that way and not the more intuitive way...

您上面描述的模式会让经验丰富的开发人员停下来思考为什么要这样做。开发人员可能会找到原始的作者,这样他就能理解为什么这样做,而不是更直观的方式……

#3


0  

In this case it may not be much useful, but keeping so many columns in a single table where we have large number of records is not suggested. Better to split the table and store the basic columns in one table like name, password etc, and other descriptive information in another table.

在这种情况下,它可能不是很有用,但是不建议将如此多的列保存在一个表中,其中有大量的记录。最好将表拆分并将基本列存储在一个表中,如名称、密码等,以及另一个表中的其他描述性信息。

#4


0  

There is no benefit to splitting out first name, last name, title, street, city, state, zip. The only good reason so to do would be to add value to each of those fields, for example you could define 'city' in terms of 'state' because they have a relationship but then you would need an ID column to disambiguate 'Springfield, Ill' form 'Springfield, Mass' and queries would get more complicated and performance would be marginally worse. So leaving it all in one table in a 'de-normalised' form in this instance seems like good sense to me.

把名字、姓氏、头衔、街道、城市、州、邮政区分开没有任何好处。唯一的理由,所以会将值添加到每一个领域,例如你可以定义“城市”的“状态”,因为他们有关系但是你需要一个ID列来消除歧义的伊利诺伊州斯普林菲尔德的形式“马萨诸塞州斯普林菲尔德”,查询将变得更加复杂和性能略差。因此,在这个例子中,把所有这些都放在一个“非正常化”的表格中,对我来说是很有意义的。