在其他表可以使用连接进行连接时,在单个表中使用多个外键是好还是坏?

时间:2022-10-03 15:25:02

Let's say I wanted to make a database that could be used to keep track of bank accounts and transactions for a user. A database that can be used in a Checkbook application.

假设我想建立一个可用于跟踪用户的银行账户和交易的数据库。可以在Checkbook应用程序中使用的数据库。

If i have a user table, with the following properties:

如果我有一个用户表,具有以下属性:

  1. user_id
  2. email
  3. password

And then I create an account table, which can be linked to a certain user:

然后我创建一个帐户表,可以链接到某个用户:

  1. account_id
  2. account_description
  3. account_balance
  4. user_id

And to go the next step, I create a transaction table:

为了进行下一步,我创建了一个事务表:

  1. transaction_id
  2. transaction_description
  3. is_withdrawal
  4. account_id // The account to which this transaction belongs
  5. account_id //此交易所属的帐户

  6. user_id // The user to which this transaction belongs
  7. user_id //此事务所属的用户

Is having the user_id in the transaction table a good option? It would make the query cleaner if I wanted to get all the transactions for each user, such as:

在事务表中使用user_id是一个不错的选择吗?如果我想获得每个用户的所有事务,它将使查询更清晰,例如:

SELECT * FROM transactions
JOIN users ON users.user_id = transactions.user_id

Or, I could just trace back to the users table from the account table

或者,我可以从帐户表中追溯到users表

SELECT * FROM transactions
JOIN accounts ON accounts.account_id = transactions.account_id
JOIN users ON users.user_id = accounts.user_id

I know the first query is much cleaner, but is that the best way to go?

我知道第一个查询更干净,但这是最好的方法吗?

My concern is that by having this extra (redundant) column in the transaction table, I'm wasting space, when I can achieve the same result without said column.

我担心的是,通过在事务表中使用这个额外的(冗余)列,我浪费了空间,当我可以在没有列的情况下实现相同的结果。

5 个解决方案

#1


3  

Let's look at it from a different angle. From where will the query or series of queries start? If you have customer info, you can get account info and then transaction info or just transactions-per-customer. You need all three tables for meaningful information. If you have account info, you can get transaction info and a pointer to customer. But to get any customer info, you need to go to the customer table so you still need all three tables. If you have transaction info, you could get account info but that is meaningless without customer info or you could get customer info without account info but transactions-per-customer is useless noise without account data.

让我们从不同的角度来看待它。查询或一系列查询将从何处开始?如果您有客户信息,您可以获得帐户信息,然后获取交易信息,或者只获取每个客户的交易。您需要所有三个表来获取有意义的信息。如果您有帐户信息,则可以获取交易信息和指向客户的指针。但要获取任何客户信息,您需要转到客户表,因此您仍然需要所有三个表。如果您有交易信息,您可以获得帐户信息,但没有客户信息就没有意义,或者您可以获得没有帐户信息的客户信息,但是每个客户的交易是没有帐户数据的无用噪音。

Either way you slice it, the information you need for any conceivable use is split up between three tables and you will have to access all three to get meaningful information instead of just a data dump.

无论哪种方式你切片,任何可以想象的使用所需的信息都分成三个表,你必须访问所有三个表来获取有意义的信息,而不仅仅是数据转储。

Having the customer FK in the transaction table may provide you with a way to make a "clean" query, but the result of that query is of doubtful usefulness. So you've really gained nothing. I've worked writing Anti-Money Laundering (AML) scanners for an international credit card company, so I'm not being hypothetical. You're always going to need all three tables anyway.

在交易表中拥有客户FK可能会为您提供一种“干净”查询的方法,但该查询的结果具有可疑性。所以你真的什么都没得到。我曾为国际信用卡公司编写反洗钱(AML)扫描仪,所以我不是假设。无论如何,你总是需要所有三张桌子。

Btw, the fact that there are FKs in the first place tells me the question concerns an OLTP environment. An OLAP environment (data warehouse) doesn't need FKs or any other data integrity checks as warehouse data is static. The data originates from an OLTP environment where the data integrity checks have already been made. So there you can denormalize to your hearts content. So let's not be giving answers applicable to an OLAP environment to a question concerning an OLTP environment.

顺便说一下,首先有FK的事实告诉我这个问题涉及OLTP环境。 OLAP环境(数据仓库)不需要FK或任何其他数据完整性检查,因为仓库数据是静态的。数据源自已进行数据完整性检查的OLTP环境。所以你可以对你的心灵内容进行反规范化。因此,我们不要将适用于OLAP环境的答案提供给有关OLTP环境的问题。

#2


2  

You should not use two foreign keys in the same table. This is not a good database design.

您不应在同一个表中使用两个外键。这不是一个好的数据库设计。

A user makes transactions through an account. That is how it is logically done; therefore, this is how the DB should be designed.

用户通过帐户进行交易。这就是逻辑上的完成方式;因此,这就是数据库的设计方式。

Using joins is how this should be done. You should not use the user_id key as it is already in the account table.

使用连接是如何做到的。您不应该使用user_id密钥,因为它已经存在于帐户表中。

The wasted space is unnecessary and is a bad database design.

浪费的空间是不必要的,是一个糟糕的数据库设计。

#3


2  

Denormalizing is usually a bad idea. In the first place it is often not faster from a performance standard. What it does is make the data integrity at risk and it can create massive problems if you end up changing from a 1-1 relationship to a 1-many.

非规范化通常是一个坏主意。首先,性能标准通常不会更快。它的作用是使数据完整性处于危险之中,如果最终从1-1关系变为1-many,则可能会产生大量问题。

For instance what is to say that each account will have only one user? In your table design that is all you would get which is something I find suspicious right off the bat. Accounts in my system can have thousands of users. SO that is the first place I question your model. Did you actually think interms of whether the realtionships woudl be 1-1 or 1-many? Or did you just make an asssumpltion? Datamodels are NOT easy to adjust after you have millions of records, you need to do far more planning for the future in database design and far more thinking about the data needs over time than you do in application design.

例如,每个帐户只有一个用户?在你的桌面设计中,你会得到的是我发现可疑的东西。我系统中的帐户可以拥有数千个用户。所以这是我对你的模型提出质疑的第一个地方。您是否真的认为实际情况是1-1还是1-many?或者你刚刚做出了什么?拥有数百万条记录后,数据模型不易调整,您需要为数据库设计的未来做更多的规划,并且比应用程序设计中更多地考虑数据需求。

But suppose you have one-one relationship now. And three months after you go live you get a new account where they need to have 3 users. Now you have to rememeber all the places you denornmalized in order to properly fix the data. This can create much confusion as inevitably you will forget some of them.

但是假设你现在有一对一的关系。在您上线三个月后,您将获得一个新帐户,他们需要拥有3个用户。现在,您必须记住您重新命名的所有位置才能正确修复数据。这可能会造成很多混乱,因为你不可避免地会忘记其中的一些。

Further even if you never will need to move to a more robust model, how are you going to maintain this if the user_id changes as they are going to do often. Now in order to keep the data integrity, you need to have a trigger to maintain the data as it changes. Worse, if the data can be changed from either table you could get conflicting changes. How do you handle those?

即使您永远不需要转移到更强大的模型,如果user_id经常更改,您将如何维护此模型。现在,为了保持数据的完整性,您需要有一个触发器来保持数据的变化。更糟糕的是,如果可以从任一表更改数据,则可能会发生冲突的更改。你怎么处理那些?

So you have created a maintenance mess and possibly risked your data intergrity all to write "cleaner" code and save yourself all of ten seconds writing a join? You gain nothing in terms of things that are important in database development such as performance or security or data integrity and you risk alot. How short-sighted is that?

所以你创建了一个维护混乱并且可能冒着数据完整性的风险写下“更干净”的代码并且在写入连接的十秒钟内保存自己?对于在数据库开发中非常重要的事情,例如性能或安全性或数据完整性,你没有任何收获,而且你冒了很多风险。那是短视的吗?

You need to stop thinking in terms of "Cleaner code" when developiong for databases. Often the best code for a query is the most complex appearing as it is the most performant and that is critical for databases. Don't project object-oriented coding techniques into database developement, they are two very differnt things with very differnt needs. You need to start thinking in terms of how this will play out as the data changes which you clearly are not doing or you would not even consider doing such a thing. You need to think more of thr data meaning and less of the "Principles of software development" which are taught as if they apply to everything but in reality do not apply well to databases.

在为数据库开发时,你需要停止考虑“清洁代码”。通常,查询的最佳代码是最复杂的,因为它是最高效的,对数据库而言至关重要。不要将面向对象的编码技术投射到数据库开发中,它们是两个非常不同的东西,需求非常不同。您需要开始思考这将如何发挥作用,因为您明显没有做的数据更改,或者您甚至不会考虑做这样的事情。您需要更多地考虑数据的意义而不是“软件开发原理”,它们被教导,好像它们适用于所有内容,但实际上并不适用于数据库。

#4


1  

In my opinion, if you have simple Many-To-Many relation just use two primary keys, and that's all.

在我看来,如果你有简单的多对多关系,只需使用两个主键,就是这样。

Otherwise, if you have Many-To-Many relation with extra columns use one primary key, and two foreign keys. It's easier to manage this table as single Entity, just like Doctrine do it. Generally speaking simple Many-To-Many relations are rare, and they are usefull just for linking two tables.

否则,如果您与额外列具有“多对多”关系,则使用一个主键和两个外键。像单一实体一样管理这个表更容易,就像Doctrine那样。一般来说,简单的“多对多”关系很少见,它们仅用于链接两个表。

#5


1  

It depends. If you can get the data fast enough, used the normalized version (where user_id is NOT in the transaction table). If you are worried about performance, go ahead and include user_ID. It will use up more space in the database by storing redundant information, but you will be able to return the data faster.

这取决于。如果您可以足够快地获取数据,请使用规范化版本(其中user_id不在事务表中)。如果您担心性能,请继续并包含user_ID。它将通过存储冗余信息在数据库中占用更多空间,但您将能够更快地返回数据。

EDIT

There are several factors to consider when deciding whether or not to denormalize a data structure. Each situation needs to be considered uniquely; no answer is sufficient without looking at the specific situation (hence the "It depends" that begins this answer). For the simple case above, denormalization would probably not be an optimal solution.

在决定是否对数据结构进行非规范化时,需要考虑几个因素。每种情况都需要被视为独特的;没有回答特定情况就没有答案是足够的(因此,“这取决于”开始这个答案)。对于上面的简单情况,非规范化可能不是最佳解决方案。

#1


3  

Let's look at it from a different angle. From where will the query or series of queries start? If you have customer info, you can get account info and then transaction info or just transactions-per-customer. You need all three tables for meaningful information. If you have account info, you can get transaction info and a pointer to customer. But to get any customer info, you need to go to the customer table so you still need all three tables. If you have transaction info, you could get account info but that is meaningless without customer info or you could get customer info without account info but transactions-per-customer is useless noise without account data.

让我们从不同的角度来看待它。查询或一系列查询将从何处开始?如果您有客户信息,您可以获得帐户信息,然后获取交易信息,或者只获取每个客户的交易。您需要所有三个表来获取有意义的信息。如果您有帐户信息,则可以获取交易信息和指向客户的指针。但要获取任何客户信息,您需要转到客户表,因此您仍然需要所有三个表。如果您有交易信息,您可以获得帐户信息,但没有客户信息就没有意义,或者您可以获得没有帐户信息的客户信息,但是每个客户的交易是没有帐户数据的无用噪音。

Either way you slice it, the information you need for any conceivable use is split up between three tables and you will have to access all three to get meaningful information instead of just a data dump.

无论哪种方式你切片,任何可以想象的使用所需的信息都分成三个表,你必须访问所有三个表来获取有意义的信息,而不仅仅是数据转储。

Having the customer FK in the transaction table may provide you with a way to make a "clean" query, but the result of that query is of doubtful usefulness. So you've really gained nothing. I've worked writing Anti-Money Laundering (AML) scanners for an international credit card company, so I'm not being hypothetical. You're always going to need all three tables anyway.

在交易表中拥有客户FK可能会为您提供一种“干净”查询的方法,但该查询的结果具有可疑性。所以你真的什么都没得到。我曾为国际信用卡公司编写反洗钱(AML)扫描仪,所以我不是假设。无论如何,你总是需要所有三张桌子。

Btw, the fact that there are FKs in the first place tells me the question concerns an OLTP environment. An OLAP environment (data warehouse) doesn't need FKs or any other data integrity checks as warehouse data is static. The data originates from an OLTP environment where the data integrity checks have already been made. So there you can denormalize to your hearts content. So let's not be giving answers applicable to an OLAP environment to a question concerning an OLTP environment.

顺便说一下,首先有FK的事实告诉我这个问题涉及OLTP环境。 OLAP环境(数据仓库)不需要FK或任何其他数据完整性检查,因为仓库数据是静态的。数据源自已进行数据完整性检查的OLTP环境。所以你可以对你的心灵内容进行反规范化。因此,我们不要将适用于OLAP环境的答案提供给有关OLTP环境的问题。

#2


2  

You should not use two foreign keys in the same table. This is not a good database design.

您不应在同一个表中使用两个外键。这不是一个好的数据库设计。

A user makes transactions through an account. That is how it is logically done; therefore, this is how the DB should be designed.

用户通过帐户进行交易。这就是逻辑上的完成方式;因此,这就是数据库的设计方式。

Using joins is how this should be done. You should not use the user_id key as it is already in the account table.

使用连接是如何做到的。您不应该使用user_id密钥,因为它已经存在于帐户表中。

The wasted space is unnecessary and is a bad database design.

浪费的空间是不必要的,是一个糟糕的数据库设计。

#3


2  

Denormalizing is usually a bad idea. In the first place it is often not faster from a performance standard. What it does is make the data integrity at risk and it can create massive problems if you end up changing from a 1-1 relationship to a 1-many.

非规范化通常是一个坏主意。首先,性能标准通常不会更快。它的作用是使数据完整性处于危险之中,如果最终从1-1关系变为1-many,则可能会产生大量问题。

For instance what is to say that each account will have only one user? In your table design that is all you would get which is something I find suspicious right off the bat. Accounts in my system can have thousands of users. SO that is the first place I question your model. Did you actually think interms of whether the realtionships woudl be 1-1 or 1-many? Or did you just make an asssumpltion? Datamodels are NOT easy to adjust after you have millions of records, you need to do far more planning for the future in database design and far more thinking about the data needs over time than you do in application design.

例如,每个帐户只有一个用户?在你的桌面设计中,你会得到的是我发现可疑的东西。我系统中的帐户可以拥有数千个用户。所以这是我对你的模型提出质疑的第一个地方。您是否真的认为实际情况是1-1还是1-many?或者你刚刚做出了什么?拥有数百万条记录后,数据模型不易调整,您需要为数据库设计的未来做更多的规划,并且比应用程序设计中更多地考虑数据需求。

But suppose you have one-one relationship now. And three months after you go live you get a new account where they need to have 3 users. Now you have to rememeber all the places you denornmalized in order to properly fix the data. This can create much confusion as inevitably you will forget some of them.

但是假设你现在有一对一的关系。在您上线三个月后,您将获得一个新帐户,他们需要拥有3个用户。现在,您必须记住您重新命名的所有位置才能正确修复数据。这可能会造成很多混乱,因为你不可避免地会忘记其中的一些。

Further even if you never will need to move to a more robust model, how are you going to maintain this if the user_id changes as they are going to do often. Now in order to keep the data integrity, you need to have a trigger to maintain the data as it changes. Worse, if the data can be changed from either table you could get conflicting changes. How do you handle those?

即使您永远不需要转移到更强大的模型,如果user_id经常更改,您将如何维护此模型。现在,为了保持数据的完整性,您需要有一个触发器来保持数据的变化。更糟糕的是,如果可以从任一表更改数据,则可能会发生冲突的更改。你怎么处理那些?

So you have created a maintenance mess and possibly risked your data intergrity all to write "cleaner" code and save yourself all of ten seconds writing a join? You gain nothing in terms of things that are important in database development such as performance or security or data integrity and you risk alot. How short-sighted is that?

所以你创建了一个维护混乱并且可能冒着数据完整性的风险写下“更干净”的代码并且在写入连接的十秒钟内保存自己?对于在数据库开发中非常重要的事情,例如性能或安全性或数据完整性,你没有任何收获,而且你冒了很多风险。那是短视的吗?

You need to stop thinking in terms of "Cleaner code" when developiong for databases. Often the best code for a query is the most complex appearing as it is the most performant and that is critical for databases. Don't project object-oriented coding techniques into database developement, they are two very differnt things with very differnt needs. You need to start thinking in terms of how this will play out as the data changes which you clearly are not doing or you would not even consider doing such a thing. You need to think more of thr data meaning and less of the "Principles of software development" which are taught as if they apply to everything but in reality do not apply well to databases.

在为数据库开发时,你需要停止考虑“清洁代码”。通常,查询的最佳代码是最复杂的,因为它是最高效的,对数据库而言至关重要。不要将面向对象的编码技术投射到数据库开发中,它们是两个非常不同的东西,需求非常不同。您需要开始思考这将如何发挥作用,因为您明显没有做的数据更改,或者您甚至不会考虑做这样的事情。您需要更多地考虑数据的意义而不是“软件开发原理”,它们被教导,好像它们适用于所有内容,但实际上并不适用于数据库。

#4


1  

In my opinion, if you have simple Many-To-Many relation just use two primary keys, and that's all.

在我看来,如果你有简单的多对多关系,只需使用两个主键,就是这样。

Otherwise, if you have Many-To-Many relation with extra columns use one primary key, and two foreign keys. It's easier to manage this table as single Entity, just like Doctrine do it. Generally speaking simple Many-To-Many relations are rare, and they are usefull just for linking two tables.

否则,如果您与额外列具有“多对多”关系,则使用一个主键和两个外键。像单一实体一样管理这个表更容易,就像Doctrine那样。一般来说,简单的“多对多”关系很少见,它们仅用于链接两个表。

#5


1  

It depends. If you can get the data fast enough, used the normalized version (where user_id is NOT in the transaction table). If you are worried about performance, go ahead and include user_ID. It will use up more space in the database by storing redundant information, but you will be able to return the data faster.

这取决于。如果您可以足够快地获取数据,请使用规范化版本(其中user_id不在事务表中)。如果您担心性能,请继续并包含user_ID。它将通过存储冗余信息在数据库中占用更多空间,但您将能够更快地返回数据。

EDIT

There are several factors to consider when deciding whether or not to denormalize a data structure. Each situation needs to be considered uniquely; no answer is sufficient without looking at the specific situation (hence the "It depends" that begins this answer). For the simple case above, denormalization would probably not be an optimal solution.

在决定是否对数据结构进行非规范化时,需要考虑几个因素。每种情况都需要被视为独特的;没有回答特定情况就没有答案是足够的(因此,“这取决于”开始这个答案)。对于上面的简单情况,非规范化可能不是最佳解决方案。