你喜欢你的主键吗?

时间:2022-10-03 22:28:30

In a fairly animated discussion in my team I was made to think what most people like as primary keys. We had the following groups-

在我的团队的一个相当生气勃勃的讨论中,我被认为是大多数人喜欢的主键。我们有以下小组 -

  1. Int/ BigInt which autoincrement are good enough primary keys.
  2. Int / BigInt哪个自动增量是足够好的主键。

  3. There should be at least 3 columns that make up the primary key.
  4. 应该至少有3列构成主键。

  5. Id, GUID and human readable row identifiers all should be treated differently.
  6. Id,GUID和人类可读行标识符都应该区别对待。

What's the best approach for PKs? It would be awesome if you could justify your opinion. Is there a better approach that the above?

什么是PK的最佳方法?如果你可以证明你的意见,这将是很棒的。上面有更好的方法吗?

EDIT: Anyone has a simple sample/algorithm to generate human readable identifiers for rows that scales well?

编辑:任何人都有一个简单的样本/算法来生成可扩展的行的人类可读标识符?

26 个解决方案

#1


If you're going to be doing any syncing between databases with occasionally connected apps, then you should be using GUIDs for your primary keys. It is kind of a pain for debugging, so apart from that case I tend to stick to ints that autoincrement.

如果您要在偶尔连接的应用程序的数据库之间进行任何同步,那么您应该使用GUID作为主键。这是一种调试的痛苦,所以除了这种情况,我倾向于坚持自动增量的整数。

Autoincrement ints should be your default, and not using them should be justified.

自动增量int应该是您的默认值,不应使用它们是合理的。

#2


I don't see an answer which points out (what I regard as) the really fundamental point - namely, that a primary key is what guarantees that you won't get two entries in the table for the same real-world entity (as modelled in the database). This observation helps establish what are good and what are bad choices for primary key.

我没有看到一个答案指出(我认为)真正的基本点 - 即主键是什么保证你不会在表中为同一个真实世界的实体获得两个条目(如在数据库中建模)。这种观察有助于确定主键的优点和选择。

For example, in a table of (US) state names and codes, either the name or the code could be the primary key - they constitute two different candidate keys, and one of them (normally the shorter - the code) is chosen as the primary key. In the theory of functional dependencies (and join dependencies - 1NF through 5NF - it is the candidate keys that are crucial rather than a primary key.

例如,在(US)州名和代码表中,名称或代码可以是主键 - 它们构成两个不同的候选键,其中一个(通常是较短的 - 代码)被选为首要的关键。在函数依赖性理论(和连接依赖性 - 1NF到5NF)中,候选键是关键而不是主键。

For a counter-example, human names generally make a bad choice for primary key. There are many people who go by the name "John Smith" or some other similar names; even taking middle names into account (remember: not everyone has one - for example, I don't), there is plenty of scope for duplication. Consequently, people do not use names as primary keys. They invent artificial keys such as the Social Security Number (SSN) or Employee Number and use them to designate the individual.

对于反例,人名通常是主键的错误选择。有很多人以“约翰史密斯”或其他类似的名字命名;甚至考虑到中间名(记住:不是每个人都有一个 - 例如,我没有),有很多重复的余地。因此,人们不会使用名称作为主键。他们发明了人工密钥,例如社会安全号码(SSN)或员工编号,并使用它们来指定个人。

An ideal primary key is short, unique, memorable, and natural. Of these characteristics, uniqueness is mandatory; the rest have to flex given the constraints of real world data.

理想的主键是短小,独特,令人难忘和自然。在这些特征中,唯一性是强制性的;考虑到现实世界数据的限制,其余的必须弯曲。

When it comes to determining the primary key of a given table, therefore, you have to look at what that table represents. What set or sets of column values in the table uniquely identifies each row in the table? Those are the candidate keys. Now, if each candidate key consists of 4 or 5 columns, then you might decide that those are too clumsy to make a good primary key (primarily on grounds of shortness). In those circumstances, you might introduce a surrogate key - an artificially generated number. Very often (but not always) a simple 32-bit integer is sufficient for the surrogate key. You then designate this surrogate key as the primary key.

因此,在确定给定表的主键时,您必须查看该表所代表的内容。表中的哪些列集或列集唯一标识表中的每一行?这些是候选键。现在,如果每个候选键由4列或5列组成,那么您可能会认为这些列太笨拙而无法制作好的主键(主要是基于短路)。在这种情况下,您可能会引入一个代理键 - 一个人工生成的数字。通常(但不总是)一个简单的32位整数就足以代替密钥。然后,您将此代理键指定为主键。

However, you must still ensure that the other candidate keys (for the surrogate key is a candidate key too, as well as the chosen primary key) are all maintained as unique identifier - normally by placing a unique constraint on those sets of columns.

但是,您仍必须确保其他候选键(对于代理键也是候选键,以及所选主键)都保持为唯一标识符 - 通常通过在这些列集上放置唯一约束。

Sometimes, people find it difficult to identify what makes a row unique, but there should be something to do that, because simply repeating a piece of information doesn't make it any more true. And if you're not careful and do get two (or more) rows purporting to store the same information, and you then need to update the information, there is a danger (especially if you use cursors) that you will update just one row rather than every row, so the rows are out of synchrony and no-one knows which row contains the correct information.

有时,人们发现难以确定哪一行是独特的,但应该有一些事情要做,因为简单地重复一条信息并不会使它更加真实。如果你不小心并且确实得到两行(或更多)声称存储相同的信息,然后你需要更新信息,那么就有危险(特别是如果你使用游标),你只会更新一行而不是每一行,所以行不同步,没有人知道哪一行包含正确的信息。

This is a pretty hard-line view, in some respects.

在某些方面,这是一个非常强硬的观点。

I've no particular problem with using a GUID when they are needed, but they tend to be big (as in 16-64 bytes), and they are used too often. Very often a perfectly good 4-byte value would suffice. Using a GUID where a 4-byte value would suffice wastes disk space, and slows up even indexed access to the data since there are fewer values per index page, so the index will be deeper and more pages have to be read to get to the information.

在需要时使用GUID我没有特别的问题,但它们往往很大(如16-64字节),并且它们经常被使用。通常,一个非常好的4字节值就足够了。使用GUID,其中4字节值足以浪费磁盘空间,并且甚至减慢了对数据的索引访问速度,因为每个索引页的值更少,因此索引将更深,并且必须读取更多页面才能到达信息。

#3


This is only a religious issue because people seek a universal right answer. The fact that both your team and this SO thread shows so much disagreement should be a clue that there are good reasons to use all the solutions you describe, in different circumstances.

这只是一个宗教问题,因为人们寻求普遍的正确答案。你的团队和这个SO线程显示出如此多的分歧的事实应该是一个线索,在不同的情况下有充分的理由使用你描述的所有解决方案。

  • Surrogate keys are useful when no other attribute or set of attributes in the table is suitable to identify rows uniquely.
  • 当表中没有其他属性或属性集适合于唯一地标识行时,代理键很有用。

  • Natural keys are preferred, when possible, to make the table more human-readable. Natural keys also allow the foreign key in a dependent table to contain a real value instead of a surrogate id. E.g. when you need to store state (CA, TX, NY) you might as well use a char(2) natural key instead of an int.
  • 在可能的情况下,自然键是首选,以使表更易于阅读。自然键还允许从属表中的外键包含实际值而不是代理ID。例如。当你需要存储状态(CA,TX,NY)时,你也可以使用char(2)自然键而不是int。

  • Use compound primary keys where appropriate. Do not add an "id" surrogate key unnecessarily when a perfectly good compound key exists (this is especially true in many-to-many tables). A mandate for a three-column key in every table is absolute nonsense.
  • 在适当的地方使用复合主键。当存在非常好的复合键时,不要不必要地添加“id”代理键(在多对多表中尤其如此)。每张表中三列密钥的授权绝对是无稽之谈。

  • GUIDs are a solution when you need to preserve uniqueness over multiple sites. They are also handy if you need values in the primary key to be unique, but not ordered or consecutive.
  • 当您需要在多个站点上保留唯一性时,GUID是一种解决方案。如果您需要主键中的值是唯一的,但不是有序的或连续的,它们也很方便。

  • INT vs. BIGINT: it's not common that a table requires a 64-bit range for primary keys, but with the increasing availability of 64-bit hardware it shouldn't be a burden, and gives more assurance that you won't overflow. INT is of course smaller, so if space is at a premium it can give a slight advantage.
  • INT与BIGINT:表对于主键需要64位范围并不常见,但随着64位硬件可用性的增加,它不应成为负担,并且更能保证您不会溢出。 INT当然是较小的,所以如果空间非常宝贵,它可以带来轻微的优势。

#4


I like The Database Programmer blog as a source for this kind of info.

我喜欢The Database Programmer博客作为此类信息的来源。

3 columns for a primary key? I would say that columns should have appropriate unique constraints as the business rules demand, but I'd still have a separate surrogate key. Compound keys mean business logic enters into the key. If the logic changes, your whole schema is screwed.

主键有3列?我会说列应该有业务规则要求的适当的唯一约束,但我仍然有一个单独的代理键。复合键意味着业务逻辑进入密钥。如果逻辑发生变化,那么整个模式都会被搞砸。

#5


I like mine unique.

我喜欢我的独特之处。

#6


I always go with the surrogate key. A surrogate key (usually an identity column, autoincrement, or GUID) is one in which the key is not present in the data itself. A natural key, on the other hand, is one that, on its own, uniquely identifies the row. As near as I can tell in life, there are hardly any real natural keys. Not even things like SSN in the United States is a natural key. Composite primary keys are a disaster waiting to happen. You can't edit any of that data (which is the major drawback of any natural key, composite or not), but worse is that with a composite key, now you have to perpetuate that key data into every related table. What a giant waste.

我总是使用代理键。代理键(通常是标识列,自动增量或GUID)是数据本身不存在密钥的代理键。另一方面,自然键是一个唯一标识行的键。就像我在生活中所说的那样,几乎没有任何真正的自然键。甚至像美国的SSN这样的东西也不是天生的关键。复合主键是一种等待发生的灾难。您无法编辑任何数据(这是任何自然键的复合或无复合的主要缺点),但更糟糕的是使用复合键,现在您必须将该关键数据保存到每个相关表中。多么巨大的浪费。

Now, for selection of the surrogate key, I stick with identity columns (I work mostly in MS SQL Server). GUID's are too large and Microsoft recommends against using them as a PK. If you have multiple servers, all you need to do is make the increment 10 or 20 or whatever you think the maximum number of servers you'll ever need to sync/expand to, and just inc the seed for each table on each subsequent server, and you'll never have a data collision.

现在,为了选择代理键,我坚持使用标识列(我主要在MS SQL Server中工作)。 GUID太大,Microsoft建议不要将它们用作PK。如果您有多台服务器,那么您需要做的就是使增量10或20或您认为需要同步/扩展的最大服务器数量,并在每个后续服务器上添加每个表的种子,你永远不会有数据冲突。

Of course, because of the increment, I make the identity column a BigInt (otherwise known as a long [64 bits]).

当然,由于增量,我将标识列设为BigInt(也称为long [64位])。

Doing a bit of math, even if you make the increment 100, you can still have 92,233,720,368,547,758 (> 92 quadrillion) rows in your table.

做一些数学运算,即使你使增量为100,你的表中仍然可以有92,233,720,368,547,758(> 92千万亿)行。

#7


I think the use of the word "Primary", in the phrase "Primary" Key is in a real sense, misleading.

我认为在短语“Primary”Key中使用“Primary”这个词是真正意义上的,具有误导性。

First, use the definition that a "key" is an attribute or set of attributes that must be unique within the table,

首先,使用“key”是表中必须唯一的属性或属性集的定义,

Then, having any key serves several often mutually inconsistent purposes.

然后,任何密钥都有几个通常相互矛盾的目的。

  1. To use as joins conditions to one or many records in child tables which have a relationship to this parent table. (Explicitly or implicitly defining a Foreign Key in those child tables)
  2. 将连接条件用作子表中与该父表有关系的一个或多个记录。 (在这些子表中显式或隐式定义外键)

  3. (related) Ensuring that child records must have a parent record in the parent tab;e (The child table FK must exist as Key in the parent table)
  4. (相关)确保子记录必须在父选项卡中具有父记录; e(子表FK必须作为父表中的键存在)

  5. To increase perforamce of queries that need to rapidly locate a specific record/row in the table.

    增加需要快速查找表中特定记录/行的查询的性能。

  6. To ensure data consistency by preventing duplicate rows which represent the same logical entity from being inserted itno the table. (This is often called a "natural" key, and should consist of table (entity) attributes which are relatively invariant.)

    通过防止表示相同逻辑实体的重复行插入表来确保数据一致性。 (这通常称为“自然”键,应该包含相对不变的表(实体)属性。)

Clearly, any non-meaningfull, non-natural key (like a GUID or an auto-generated integer is totally incapable of satisfying #4.

显然,任何非有意义的非自然键(如GUID或自动生成的整数)完全无法满足#4。

But often, with many (most) tables, a totally natural key which can provide #4 will often consist of multiple attributes and be excessively wide, or so wide that using it for purposes #1, #2, or #3 will cause unacceptable performance consequencecs.

但通常,对于许多(大多数)表,一个可以提供#4的完全自然的键通常由多个属性组成,并且过宽,或者如此宽,以至于将其用于#1,#2或#3目的将导致不可接受性能后果。

The answer is simple. Use both. Use a simple auto-Generating integral key for all Joins and FKs in other child tables, but ensure that every table that requires data consistency (very few tables don't) have an alternate natural unique key that will prevent inserts of inconsistent data rows... Plus, if you always have both, then all the objections against using a natural key (what if it changes? I have to change every place it is referenced as a FK) become moot, as you are not using it for that... You are only using it in the one table where it is a PK, to avoid inconsistent duplciate data...

答案很简单。两者兼用。对其他子表中的所有联接和FK使用简单的自动生成整数键,但要确保每个需要数据一致性的表(很少有表没有)具有备用的自然唯一键,以防止插入不一致的数据行。 ..另外,如果你总是同时使用两者,那么所有反对使用自然键的反对意见(如果它改变了怎么办?我必须改变它被引用为FK的每个地方)都没有实际意义,因为你没有使用它。 ..你只是在一个表中使用它,它是一个PK,以避免不一致的duplciate数据...

As to GUIDs, be very careful using them, as using guids in an index can hose index fragmentation. The most common algorithms used to create them puts the "random" portion of the guid in the most significant bit positions... This increases the requirement for regular index defragmentation / Reindexing as new rows are added.

对于GUID,要非常小心地使用它们,因为在索引中使用guid可以软管索引碎片。用于创建它们的最常用算法将guid的“随机”部分放在最重要的位位置......这增加了对添加新行时常规索引碎片整理/重新索引的要求。

#8


Slightly off-topic, but I feel compelled to chime in with...

稍微偏离主题,但我觉得有必要加入......

If your primary key is a GUID, do not make it a clustered index. Since GUIDs are non-sequential, the data will be re-arranged on disk during almost every insert. (Yuck.) If using GUIDs as primary keys, they should be nonclustered indexes.

如果主键是GUID,请不要使其成为聚簇索引。由于GUID是非顺序的,因此几乎每次插入时数据都将重新排列在磁盘上。 (哎呀。)如果使用GUID作为主键,它们应该是非聚簇索引。

#9


One thing you should never do is use a smart key. That is a key where information about the record is coded in the key itself, and it will eventually bite you.

你不应该做的一件事是使用智能钥匙。这是一个关键,有关记录的信息在密钥本身编码,它最终会咬你。

I worked one place, where the primary key was the account ID, which was a combination of letters and numbers. I don't remember any specifics, but, for example, those accounts that were of a certain type, would be in the 600 range, and of another type, started with 400. That was great, until that customer decided to ask for both types of work. Or changed the type of work they did.

我在一个地方工作,其中主键是帐户ID,它是字母和数字的组合。我不记得任何细节,但是,例如,那些特定类型的帐户将在600范围内,而另一种类型,从400开始。这很好,直到该客户决定要求两者工作类型。或者改变了他们所做的工作类型。

Another place, used the location in the tree as the primary key for records. So there would be records like the following.

另一个地方,使用树中的位置作为记录的主键。所以会有如下记录。

Cat1.subcatA.record1
Cat1.subcatA.record2
Cat1.subcatB.record1
Cat2.subcatA.record1

Of course, the first thing the customers wanted was a way to move items in the tree around. The entire set of software died before that happened.

当然,客户想要的第一件事就是在树中移动物品。整套软件在此之前就已经死了。

Please, please, please, if you're writing code that I ever have to maintain, please don't use a smart key!

拜托,如果您正在编写我需要维护的代码,请不要使用智能密钥!

#10


I'm a fan of the auto-increment as primary key. I know deep in my heart that this is a cop-out, but it does make it so easy to sort data by when it was added (ORDER BY ID DESC, f'r instance).

我是自动增量作为主键的粉丝。我深深地知道这是一个警察,但它确实使得在添加数据时对数据进行排序非常容易(ORDER BY ID DESC,f'r实例)。

3 columns sounds awfully harsh to humanly parse.

3列听起来非常严厉,人性化解析。

And that's the trade-off -- how much of the relational capability do you need, versus making THIS TABLE RIGHT HERE understandable to a human interrogating it (versus the stored-procedure or programmatic interface).

这就是权衡 - 你需要多少关系能力,而不是让人们在这里理解这个表格(对于存储过程或程序化界面)。

auto-increment is for us humans. :-(

自动增量是给我们人类的。 :-(

#11


Generally, it depends.

一般来说,这取决于。

Personally, I like autoincrement ints.

就个人而言,我喜欢自动增量。

But, one thing I can tell you is to never trust data from other sources as your key. I swear, every time I've done that it comes back to bite me. Well, never again!

但是,我可以告诉你的一件事是永远不要相信来自其他来源的数据作为你的关键。我发誓,每次我做完,它都会回来咬我。好吧,再也不会!

#12


There should be atleast 3 columns that make up the primary key.

应该至少有3列组成主键。

I don't understand this.

我不明白这一点。

Are you talking about a "natural key", e.g. "name and date of birth"? A natural key might be ideal if it exists, but most candidates for a natural key are either not unique (several people with the same name) or not constant (someone can change their name).

你在谈论一个“自然键”,例如“姓名和出生日期”?如果存在,自然键可能是理想的,但是自然键的大多数候选者要么不是唯一的(几个具有相同名称的人),要么不是常量(有人可以更改其名称)。

Int/ BigInt which autoincrement are good enough primary keys.

Int / BigInt哪个自动增量是足够好的主键。

I prefer Guid. A potential problem with autoincrement is that the value (e.g. "order id") is assigned by the database instance (e.g. by the "sales database") ... which won't entirely work (instead you start to need compound keys) if you ever need to merge data created by more than one database instance (e.g. from several sales offices each with their own database).

我更喜欢Guid。自动增量的一个潜在问题是值(例如“订单ID”)由数据库实例(例如“销售数据库”)分配......这不会完全起作用(相反,您开始需要复合键)您需要合并由多个数据库实例创建的数据(例如,来自多个销售办事处,每个数据库实例都有自己的数据库)。

#13


RE GUID's

Watch out if this is going to be a really Really REALLY REALLY big database, lots of load, and fast access.

注意这是否真的是一个真正非常棒的大数据库,大量的负载和快速访问。

At my last job, where we had databases of 100 to 500 million records, our database guys strongly argued against GUIDs, and for an appropriately sized decimal number. They felt that (under Oracle) the size difference in the internal storage for a string Guid - vs- a decimal value would make a very noticeable difference in lookups. ( Bigger keys = deeper trees to traverse)

在我上一份工作中,我们拥有1亿到5亿条记录的数据库,我们的数据库人员强烈反对GUID,以及适当大小的十进制数。他们认为(在Oracle下)字符串内部存储的大小差异Guid - vs- a十进制值会在查找中产生非常显着的差异。 (更大的键=更深的树遍历)

The random nature of GUIDs also reduces the fill-factor for index pages significantly - this dramatically increases tearing and disk I/O.

GUID的随机性也会显着降低索引页的填充因子 - 这会大大增加撕裂和磁盘I / O.

#14


Auto increment columns. I am able to make my code work seamlessly with SQL Server or Oracle, one using identity the other using sequences through my DAL, and I couldn't be happier. I agree, GUIDs sometimes are necessary if you are doing replication or sending data away to receive it later on afer processing.

自动增量列。我能够使我的代码与SQL Server或Oracle无缝协作,一个使用身份,另一个使用序列通过我的DAL,我不能更快乐。我同意,如果您正在进行复制或发送数据以便稍后在处理时接收它,则GUID有时是必要的。

#15


I've always used a surrogate key - an autoincrementing integer called 'id'. I can see plenty of reasons to do this even when another option is obvious:

我一直使用代理键 - 一个称为'id'的自动增量整数。即使另一个选项很明显,我也可以看到很多理由:

  • Consistency
  • Data independent (unique, not destroyed by changes to format)
  • 数据独立(唯一,不会因格式更改而被破坏)

  • Human-readable

...and no sensible reason not to:

......没有明智的理由不:

  • Ambiguity in joins? - Aliasing tables is a better practice, IMHO
  • 连接中的歧义? - 混淆表是一种更好的做法,恕我直言

  • Optimum tables? - Removing one byte per entry is premature optimisation, IMHO
  • 最佳表? - 每个条目删除一个字节是不成熟的优化,恕我直言

  • Per-table decision? - No longer consistent
  • 每桌决定? - 不再一致

  • Scaling problems? - Eh? Why?
  • 缩放问题? - 呃?为什么?

  • Hierarchical data structure? - That's denormalising, a whole other subject of religion. Suffice it to say I'm a fan in a few circumstances in theory, but never in practice :)
  • 分层数据结构? - 这是非正规化,是宗教的另一个主题。我只想说我在理论上的一些情况下是粉丝,但从未在实践中:)

sensible reasons against that I haven't thought of or come across yet are always welcomed...

我没有想到或遇到的明显理由仍然受到欢迎......

#16


This is a classic "it depends". There's no one right answer for every project. I like different things for different situations. It depends on whether I'm using an ORM and what it supports. It depends on the overall architecture (distributed or not, etc). Just pick one that you think will work and move on to arguing over tabs and spaces.

这是一个经典的“它取决于”。每个项目都没有正确答案。我喜欢不同的情况。这取决于我是否使用ORM以及它支持的内容。它取决于整体架构(分布式或非分布式)等。只需选择一个您认为可行的方法,然后继续争论标签和空格。

#17


I tend to use option #1 or #3 depending on the size, the number of people connecting, and whether it is a multiple database server situation or not.

我倾向于使用选项#1或#3,具体取决于大小,连接人数以及是否是多数据库服务器情况。

Option #2 doesn't make much sense to me. If any one of the three is not enough to identify a unique record, then it's possible (without going through extra machinations) two have two records show up with the same values in all three columns. If you want to enforce uniqueness on any combination of the three, then just add an index for them.

选项#2对我来说没什么意义。如果三者中的任何一个不足以识别唯一记录,则可能(不经过额外的阴谋)两个记录在所有三列中显示具有相同值的两个记录。如果要对三者的任意组合强制实施唯一性,则只需为它们添加索引即可。

#18


I've only use an auto-increment int or a GUID. 99% of the time I've use auto-increment int. It's just what I was taught to use when I first learned about databases and have never run into a reason not to use them (although I know of reasons why a GUID would be better).

我只使用自动增量int或GUID。 99%的时间我使用自动增量int。这就是我第一次学习数据库时所学到的东西,并且从未遇到过不使用它们的原因(虽然我知道为什么GUID会更好)。

I like auto increment ints because it helps with readability. For example I can say "take a look at record 129383" and it's pretty easy for someone to go in and find it. With a GUID that's nearly impossible to do.

我喜欢自动增量int,因为它有助于提高可读性。例如,我可以说“看看记录129383”,并且很容易让某人进去找到它。使用GUID几乎不可能做到。

#19


Past a basic definitional answer, what constitutes a good primary key is left largely to religion and break room arguments. If you have something that is, and will always, map uniquely to an individual row, then it will work fine as a primary key. Past that point, there are other considerations:

过去一个基本的定义答案,什么构成一个好的主键主要是宗教和打破房间的论点。如果您拥有的东西,并且将始终唯一地映射到单个行,那么它将作为主键正常工作。过去,还有其他一些考虑因素:

  • Is the primary key definition not overly complex? Does it avoid introducing unnecessary complexity for the sake of following a "best-practice"?
  • 主键定义是不是过于复杂?是否为了遵循“最佳实践”而避免引入不必要的复杂性?

  • Is there a better possible primary key that would require less overhead for the database to handle (i.e. INTEGER vs. VARCHAR, etc)?
  • 是否有更好的主键需要更少的数据库处理开销(即INTEGER与VARCHAR等)?

  • Am I ABSOLUTELY certain that the uniqueness and defined-ness invariant of my primary key will not change?
  • 我绝对肯定我的主键的唯一性和定义不变量不会改变吗?

This last one is likely what draws most people to use things like GUIDs or self-incrementing integer columns, because relying on things like addresses, phone numbers, first/last names, etc, just don't cut it. The only invariant about people I can think of is SSNs, but then I'm not even 100% certain about those remaining forever unique.

最后一个可能是吸引大多数人使用诸如GUID或自增量整数列之类的东西,因为依赖于诸如地址,电话号码,名字/姓氏之类的东西,只是不要削减它。关于我能想到的人的唯一不变量是SSN,但是我甚至不能100%肯定那些永远独特的人。

Hopefully this helps add some clarity...

希望这有助于增加一些清晰度......

#20


The way I approach primary keys (and I feel is the best) is to avoid having a "default" approach. This means instead of just slapping on an auto-incrementing integer and calling it a day I look at the problem and say "is there a column or group of columns that will always be unqiue and won't change?" If the answer is yes then I take that approach.

我接近主键的方式(我觉得最好)是避免使用“默认”方法。这意味着不是只是单击一个自动递增的整数并调用它一天,我会查看问题并说“是否有一列或一组列始终是unqiue且不会更改?”如果答案是肯定的,那么我采取这种方法。

#21


Almost always integers.

几乎总是整数。

They have other good reasons besides being smaller/faster to process. Which would you rather write down - "404040" or "3463b5a2-a02b-4fd4-aa0f-1d3c0450026c"?

除了更小/更快的处理之外,它们还有其他好的理由。你宁愿写下哪一个 - “404040”或“3463b5a2-a02b-4fd4-aa0f-1d3c0450026c”?

#22


Only slightly relevant, but one thing I've started doing recently when I have small classification tables (essentially those that would represent ENUMs in code) is that I'll make the primary key a char(3) or char(4). Then I make those primary keys representative of the lookup value.

只是略微相关,但有一件事我最近开始做的时候我有小分类表(基本上代表代码中的ENUM)是我将主键设为char(3)或char(4)。然后我创建代表查找值的主键。

For example, I have a quoting system for our internal Sales Agents. We have "Cost Categories" that every quote line item is assigned one of... So I have a type lookup table called 'tCostCategories', where primary key is 'MTL', 'SVC', 'TRV', 'TAX', 'ODC'. Other columns in the lookup table store more details, such as the normal english meanings of the codes, "Material", "Service", "Travel", "Taxes", "Other Direct Costs", and so forth.

例如,我有一个内部销售代理的报价系统。我们有“成本类别”,每个报价行项目都分配了一个...所以我有一个名为'tCostCategories'的类型查找表,其中主键是'MTL','SVC','TRV','TAX', 'ODC'。查找表中的其他列存储更多详细信息,例如代码的正常英语含义,“材料”,“服务”,“旅行”,“税收”,“其他直接成本”等。

This is really nice because it doesn't use any more space than an int, and when you are looking at the source data, you don't have to link the lookup table to know what the heck the value is. For example, a quote row might look like:

这非常好,因为它不使用任何空间而不是int,当您查看源数据时,您不必链接查找表以了解该值是什么。例如,引用行可能如下所示:

1 PartNumber $40 MTL
2 OtherPartNumber $29.99 SVC
3 PartNumber2 $150 TRV

1 PartNumber $ 40 MTL 2 OtherPartNumber $ 29.99 SVC 3 PartNumber2 $ 150 TRV

It's much easier that using an int to represent the categories and then linking 1, 2, 3 on all the lines - you have the data right there in front of you, and the performance doesn't seem affected at all (not that I've truly tested.)

使用int来表示类别然后在所有行上链接1,2,3要容易得多 - 你的数据就在你面前,并且性能似乎根本没有受到影响(不是我'真正经过考验。)

As far as the real question goes... I like RowGUID uniqueidentifiers. I'm not 100% on this, but don't all rows have internal RowGuid's anyway?? If so, then using the RowGuid would actually take less space than ints (or anything else for that matter.) All I know is that if it's good enough for M$ to use in GreatPlains then it's good enough for me. (Should I duck??)

就真正的问题而言......我喜欢RowGUID uniqueidentifiers。我不是百分之百,但不是所有的行都有内部的RowGuid吗?如果是这样,那么使用RowGuid实际上会占用比int更少的空间(或者其他任何东西。)我所知道的是,如果它足以让M $在GreatPlains中使用那么它对我来说已经足够了。 (我应该躲?)

#23


Oh one more reason I use GUIDs - I use a hierarchical data structure. That is, I have a table 'Company' and a table 'Vendor' for which the Primary Keys match up. But I also have a table 'Manufacturer' that also 'inherits' from Company. The fields that are common to Vendors and Manufacturers don't appear in those tables - they appear in Company. In this setup, using int's is much more painful than Guids. In the very least, you can't use identity primary keys.

哦,我使用GUID的另一个原因 - 我使用分层数据结构。也就是说,我有一个表'Company'和一个表'Vendor',其中主键匹配。但我也有一个'制造商'表,也'继承'公司。供应商和制造商共有的字段不会出现在这些表中 - 它们出现在公司中。在这个设置中,使用int比Guids更痛苦。至少,您不能使用身份主键。

#24


I like natural keys, whenever I can trust them. I'm willing to pay a small performance price price in order to use keys that make sense to the subject matter experts.

每当我信任他们时,我都喜欢自然键。我愿意支付一个小的性价格,以便使用对主题专家有意义的密钥。

For tables that describe entities, there should be a simple natural key that identifies individual instances the same way the subject matter people do. If the subject matter does not have trustworthy identifiers for one of the entities, then I'll resort to a surrogate key.

对于描述实体的表,应该有一个简单的自然键,以与主题人员相同的方式识别各个实例。如果主题没有其中一个实体的可信标识符,那么我将使用代理键。

For tables that describe relationships, I use a compound key, where each component references an entity that participates in the relationship, and therefore a row in an entity table. Again, the performance hit for using a compound key is generally minimal.

对于描述关系的表,我使用复合键,其中每个组件引用参与关系的实体,因此引用实体表中的行。同样,使用复合键的性能损失通常很小。

As others have pointed out, the term "primary key" is a little misleading. In the Relational Data Model, the term that's used is "candidate keys". There could be several candidate keys for a single table. Logically, each one is just as good as another. Choosing one of them as "primary" and making all references via that key is simply a choice the designer can make.

正如其他人所指出的那样,“主键”一词有点误导。在关系数据模型中,使用的术语是“候选键”。单个表可能有几个候选键。从逻辑上讲,每一个都和另一个一样好。选择其中一个作为“主要”并通过该键进行所有引用只是设计师可以做出的选择。

#25


Guids.period.

In the event that you need to scale out or you need to assign the primary key by alternate means they will be your friend. You can add indexes for everything else.

如果您需要扩展或需要通过其他方式分配主键,他们将成为您的朋友。您可以为其他所有内容添加索引。


update to clarify my statement.

更新以澄清我的陈述。

I've worked on a lot of different kinds of sites. From small single server deals to large ones backed with multiple DB and web servers. There have certainly been apps that would have been just fine with auto incrementing ints as primary keys. However, those don't fit the model of how I do things.

我曾经在很多不同类型的网站上工作过。从小型单服务器交易到支持多个数据库和Web服务器的大型服务器。肯定有一些应用程序可以自动增加整数作为主键。然而,那些不符合我如何做事的模型。

When using a GUID you can generate the ID anywhere. It could be generated by a remote server, your web app, within the database itself or even within multiple databases in a multimaster situation.

使用GUID时,您可以在任何地方生成ID。它可以由远程服务器,您的Web应用程序,在数据库本身内生成,甚至可以在多主机情况下的多个数据库中生成。

On the other hand, an auto incremented INT can only be safely generated within the primary database. Again, this might be okay if you have an application that will be intimately tied to that one backing DB server and scaling out is not something you are concerned with.

另一方面,只能在主数据库中安全地生成自动递增的INT。同样,如果您的应用程序与该备份数据库服务器密切相关,并且扩展不是您关心的问题,那么这可能没问题。

Sure, usage of GUIDs mean you have to have nightly reindexing processes. However, if you are using anything other than an auto incremented INT you should do that anyway. Heck, even with an INT as the primary it's likely you have other indexes that need regenerated to deal with fragmentation. Therefore, using GUIDs doesn't exactly add another problem because those tasks need to be performed regardless.

当然,使用GUID意味着您必须每晚重建索引过程。但是,如果您使用的是除自动增量INT之外的任何其他内容,则无论如何都应该这样做。哎呀,即使将INT作为主要内容,您可能还需要重新生成其他索引来处理碎片。因此,使用GUID并不会完全添加另一个问题,因为无论如何都需要执行这些任务。

If you take a look at the larger apps out there you will notice something important: they all use Base64 encoded GUIDs as the keys. The reason for this is simple, usage of GUIDs enables you to scale out easily whereas there can be a lot of hoops to jump through when attempting to scale out INTs.

如果您看一下较大的应用程序,您会发现一些重要的事情:它们都使用Base64编码的GUID作为密钥。原因很简单,GUID的使用使您可以轻松地扩展,而在尝试扩展INT时可能会有很多跳跃。

Our latest app goes through a period of heavy inserts that lasts for about a month. After that 90+% of the queries are all selects for reporting. To increase capacity I can bring up additional DB servers during this large insert period; and later easily merge those into a single DB for reporting. Attempting to do that with INTs would be an absolute nightmare.

我们最新的应用程序经历了一段时间的重插入,持续了大约一个月。之后,90%以上的查询都是报告选择。为了增加容量,我可以在这个大插入期间启动额外的数据库服务器;然后很容易将它们合并到一个DB中进行报告。试图用INTs做这件事绝对是一场噩梦。

Quite frankly, any time you cluster a database or setup replication the DB server is going to demand that you have GUIDs on the table anyway. So, if you think that your system might need to grow then pick the one that's good.

坦率地说,无论何时集群数据库或设置复制,数据库服务器都会要求您在表上拥有GUID。所以,如果你认为你的系统可能需要增长,那么选择一个好的系统。

#26


This is a complex subject whether you realized it or not. Might fall under the section on this * FAQ.

无论你是否意识到这一点,这都是一个复杂的主题。可能属于*常见问题解答部分。

What kind of questions should I not ask here?

我不应该在这里问什么样的问题?

Avoid asking questions that are subjective, argumentative, or require extended discussion. This is a place for questions that can be answered!

避免提出主观,议论或需要进行深入讨论的问题。这是一个可以回答问题的地方!

This has been debated for years and will continue to be debated for years. The only hints of consensus I have seen is that the answers are somewhat predictable depending on if you are asking a OO guy (GUIDs are the only way to go!), a data modeler (Natural keys are the only way to go!), or a performance oriented DBA (INTs are the only way to go!).

多年来一直争论不休,并将继续争论多年。我见过的唯一一致的暗示是,答案在某种程度上是可以预测的,这取决于你是否要求OO人(GUID是唯一的方法!),数据建模者(自然键是唯一的方法!),或者以性能为导向的DBA(INT是唯一的方法!)。

#1


If you're going to be doing any syncing between databases with occasionally connected apps, then you should be using GUIDs for your primary keys. It is kind of a pain for debugging, so apart from that case I tend to stick to ints that autoincrement.

如果您要在偶尔连接的应用程序的数据库之间进行任何同步,那么您应该使用GUID作为主键。这是一种调试的痛苦,所以除了这种情况,我倾向于坚持自动增量的整数。

Autoincrement ints should be your default, and not using them should be justified.

自动增量int应该是您的默认值,不应使用它们是合理的。

#2


I don't see an answer which points out (what I regard as) the really fundamental point - namely, that a primary key is what guarantees that you won't get two entries in the table for the same real-world entity (as modelled in the database). This observation helps establish what are good and what are bad choices for primary key.

我没有看到一个答案指出(我认为)真正的基本点 - 即主键是什么保证你不会在表中为同一个真实世界的实体获得两个条目(如在数据库中建模)。这种观察有助于确定主键的优点和选择。

For example, in a table of (US) state names and codes, either the name or the code could be the primary key - they constitute two different candidate keys, and one of them (normally the shorter - the code) is chosen as the primary key. In the theory of functional dependencies (and join dependencies - 1NF through 5NF - it is the candidate keys that are crucial rather than a primary key.

例如,在(US)州名和代码表中,名称或代码可以是主键 - 它们构成两个不同的候选键,其中一个(通常是较短的 - 代码)被选为首要的关键。在函数依赖性理论(和连接依赖性 - 1NF到5NF)中,候选键是关键而不是主键。

For a counter-example, human names generally make a bad choice for primary key. There are many people who go by the name "John Smith" or some other similar names; even taking middle names into account (remember: not everyone has one - for example, I don't), there is plenty of scope for duplication. Consequently, people do not use names as primary keys. They invent artificial keys such as the Social Security Number (SSN) or Employee Number and use them to designate the individual.

对于反例,人名通常是主键的错误选择。有很多人以“约翰史密斯”或其他类似的名字命名;甚至考虑到中间名(记住:不是每个人都有一个 - 例如,我没有),有很多重复的余地。因此,人们不会使用名称作为主键。他们发明了人工密钥,例如社会安全号码(SSN)或员工编号,并使用它们来指定个人。

An ideal primary key is short, unique, memorable, and natural. Of these characteristics, uniqueness is mandatory; the rest have to flex given the constraints of real world data.

理想的主键是短小,独特,令人难忘和自然。在这些特征中,唯一性是强制性的;考虑到现实世界数据的限制,其余的必须弯曲。

When it comes to determining the primary key of a given table, therefore, you have to look at what that table represents. What set or sets of column values in the table uniquely identifies each row in the table? Those are the candidate keys. Now, if each candidate key consists of 4 or 5 columns, then you might decide that those are too clumsy to make a good primary key (primarily on grounds of shortness). In those circumstances, you might introduce a surrogate key - an artificially generated number. Very often (but not always) a simple 32-bit integer is sufficient for the surrogate key. You then designate this surrogate key as the primary key.

因此,在确定给定表的主键时,您必须查看该表所代表的内容。表中的哪些列集或列集唯一标识表中的每一行?这些是候选键。现在,如果每个候选键由4列或5列组成,那么您可能会认为这些列太笨拙而无法制作好的主键(主要是基于短路)。在这种情况下,您可能会引入一个代理键 - 一个人工生成的数字。通常(但不总是)一个简单的32位整数就足以代替密钥。然后,您将此代理键指定为主键。

However, you must still ensure that the other candidate keys (for the surrogate key is a candidate key too, as well as the chosen primary key) are all maintained as unique identifier - normally by placing a unique constraint on those sets of columns.

但是,您仍必须确保其他候选键(对于代理键也是候选键,以及所选主键)都保持为唯一标识符 - 通常通过在这些列集上放置唯一约束。

Sometimes, people find it difficult to identify what makes a row unique, but there should be something to do that, because simply repeating a piece of information doesn't make it any more true. And if you're not careful and do get two (or more) rows purporting to store the same information, and you then need to update the information, there is a danger (especially if you use cursors) that you will update just one row rather than every row, so the rows are out of synchrony and no-one knows which row contains the correct information.

有时,人们发现难以确定哪一行是独特的,但应该有一些事情要做,因为简单地重复一条信息并不会使它更加真实。如果你不小心并且确实得到两行(或更多)声称存储相同的信息,然后你需要更新信息,那么就有危险(特别是如果你使用游标),你只会更新一行而不是每一行,所以行不同步,没有人知道哪一行包含正确的信息。

This is a pretty hard-line view, in some respects.

在某些方面,这是一个非常强硬的观点。

I've no particular problem with using a GUID when they are needed, but they tend to be big (as in 16-64 bytes), and they are used too often. Very often a perfectly good 4-byte value would suffice. Using a GUID where a 4-byte value would suffice wastes disk space, and slows up even indexed access to the data since there are fewer values per index page, so the index will be deeper and more pages have to be read to get to the information.

在需要时使用GUID我没有特别的问题,但它们往往很大(如16-64字节),并且它们经常被使用。通常,一个非常好的4字节值就足够了。使用GUID,其中4字节值足以浪费磁盘空间,并且甚至减慢了对数据的索引访问速度,因为每个索引页的值更少,因此索引将更深,并且必须读取更多页面才能到达信息。

#3


This is only a religious issue because people seek a universal right answer. The fact that both your team and this SO thread shows so much disagreement should be a clue that there are good reasons to use all the solutions you describe, in different circumstances.

这只是一个宗教问题,因为人们寻求普遍的正确答案。你的团队和这个SO线程显示出如此多的分歧的事实应该是一个线索,在不同的情况下有充分的理由使用你描述的所有解决方案。

  • Surrogate keys are useful when no other attribute or set of attributes in the table is suitable to identify rows uniquely.
  • 当表中没有其他属性或属性集适合于唯一地标识行时,代理键很有用。

  • Natural keys are preferred, when possible, to make the table more human-readable. Natural keys also allow the foreign key in a dependent table to contain a real value instead of a surrogate id. E.g. when you need to store state (CA, TX, NY) you might as well use a char(2) natural key instead of an int.
  • 在可能的情况下,自然键是首选,以使表更易于阅读。自然键还允许从属表中的外键包含实际值而不是代理ID。例如。当你需要存储状态(CA,TX,NY)时,你也可以使用char(2)自然键而不是int。

  • Use compound primary keys where appropriate. Do not add an "id" surrogate key unnecessarily when a perfectly good compound key exists (this is especially true in many-to-many tables). A mandate for a three-column key in every table is absolute nonsense.
  • 在适当的地方使用复合主键。当存在非常好的复合键时,不要不必要地添加“id”代理键(在多对多表中尤其如此)。每张表中三列密钥的授权绝对是无稽之谈。

  • GUIDs are a solution when you need to preserve uniqueness over multiple sites. They are also handy if you need values in the primary key to be unique, but not ordered or consecutive.
  • 当您需要在多个站点上保留唯一性时,GUID是一种解决方案。如果您需要主键中的值是唯一的,但不是有序的或连续的,它们也很方便。

  • INT vs. BIGINT: it's not common that a table requires a 64-bit range for primary keys, but with the increasing availability of 64-bit hardware it shouldn't be a burden, and gives more assurance that you won't overflow. INT is of course smaller, so if space is at a premium it can give a slight advantage.
  • INT与BIGINT:表对于主键需要64位范围并不常见,但随着64位硬件可用性的增加,它不应成为负担,并且更能保证您不会溢出。 INT当然是较小的,所以如果空间非常宝贵,它可以带来轻微的优势。

#4


I like The Database Programmer blog as a source for this kind of info.

我喜欢The Database Programmer博客作为此类信息的来源。

3 columns for a primary key? I would say that columns should have appropriate unique constraints as the business rules demand, but I'd still have a separate surrogate key. Compound keys mean business logic enters into the key. If the logic changes, your whole schema is screwed.

主键有3列?我会说列应该有业务规则要求的适当的唯一约束,但我仍然有一个单独的代理键。复合键意味着业务逻辑进入密钥。如果逻辑发生变化,那么整个模式都会被搞砸。

#5


I like mine unique.

我喜欢我的独特之处。

#6


I always go with the surrogate key. A surrogate key (usually an identity column, autoincrement, or GUID) is one in which the key is not present in the data itself. A natural key, on the other hand, is one that, on its own, uniquely identifies the row. As near as I can tell in life, there are hardly any real natural keys. Not even things like SSN in the United States is a natural key. Composite primary keys are a disaster waiting to happen. You can't edit any of that data (which is the major drawback of any natural key, composite or not), but worse is that with a composite key, now you have to perpetuate that key data into every related table. What a giant waste.

我总是使用代理键。代理键(通常是标识列,自动增量或GUID)是数据本身不存在密钥的代理键。另一方面,自然键是一个唯一标识行的键。就像我在生活中所说的那样,几乎没有任何真正的自然键。甚至像美国的SSN这样的东西也不是天生的关键。复合主键是一种等待发生的灾难。您无法编辑任何数据(这是任何自然键的复合或无复合的主要缺点),但更糟糕的是使用复合键,现在您必须将该关键数据保存到每个相关表中。多么巨大的浪费。

Now, for selection of the surrogate key, I stick with identity columns (I work mostly in MS SQL Server). GUID's are too large and Microsoft recommends against using them as a PK. If you have multiple servers, all you need to do is make the increment 10 or 20 or whatever you think the maximum number of servers you'll ever need to sync/expand to, and just inc the seed for each table on each subsequent server, and you'll never have a data collision.

现在,为了选择代理键,我坚持使用标识列(我主要在MS SQL Server中工作)。 GUID太大,Microsoft建议不要将它们用作PK。如果您有多台服务器,那么您需要做的就是使增量10或20或您认为需要同步/扩展的最大服务器数量,并在每个后续服务器上添加每个表的种子,你永远不会有数据冲突。

Of course, because of the increment, I make the identity column a BigInt (otherwise known as a long [64 bits]).

当然,由于增量,我将标识列设为BigInt(也称为long [64位])。

Doing a bit of math, even if you make the increment 100, you can still have 92,233,720,368,547,758 (> 92 quadrillion) rows in your table.

做一些数学运算,即使你使增量为100,你的表中仍然可以有92,233,720,368,547,758(> 92千万亿)行。

#7


I think the use of the word "Primary", in the phrase "Primary" Key is in a real sense, misleading.

我认为在短语“Primary”Key中使用“Primary”这个词是真正意义上的,具有误导性。

First, use the definition that a "key" is an attribute or set of attributes that must be unique within the table,

首先,使用“key”是表中必须唯一的属性或属性集的定义,

Then, having any key serves several often mutually inconsistent purposes.

然后,任何密钥都有几个通常相互矛盾的目的。

  1. To use as joins conditions to one or many records in child tables which have a relationship to this parent table. (Explicitly or implicitly defining a Foreign Key in those child tables)
  2. 将连接条件用作子表中与该父表有关系的一个或多个记录。 (在这些子表中显式或隐式定义外键)

  3. (related) Ensuring that child records must have a parent record in the parent tab;e (The child table FK must exist as Key in the parent table)
  4. (相关)确保子记录必须在父选项卡中具有父记录; e(子表FK必须作为父表中的键存在)

  5. To increase perforamce of queries that need to rapidly locate a specific record/row in the table.

    增加需要快速查找表中特定记录/行的查询的性能。

  6. To ensure data consistency by preventing duplicate rows which represent the same logical entity from being inserted itno the table. (This is often called a "natural" key, and should consist of table (entity) attributes which are relatively invariant.)

    通过防止表示相同逻辑实体的重复行插入表来确保数据一致性。 (这通常称为“自然”键,应该包含相对不变的表(实体)属性。)

Clearly, any non-meaningfull, non-natural key (like a GUID or an auto-generated integer is totally incapable of satisfying #4.

显然,任何非有意义的非自然键(如GUID或自动生成的整数)完全无法满足#4。

But often, with many (most) tables, a totally natural key which can provide #4 will often consist of multiple attributes and be excessively wide, or so wide that using it for purposes #1, #2, or #3 will cause unacceptable performance consequencecs.

但通常,对于许多(大多数)表,一个可以提供#4的完全自然的键通常由多个属性组成,并且过宽,或者如此宽,以至于将其用于#1,#2或#3目的将导致不可接受性能后果。

The answer is simple. Use both. Use a simple auto-Generating integral key for all Joins and FKs in other child tables, but ensure that every table that requires data consistency (very few tables don't) have an alternate natural unique key that will prevent inserts of inconsistent data rows... Plus, if you always have both, then all the objections against using a natural key (what if it changes? I have to change every place it is referenced as a FK) become moot, as you are not using it for that... You are only using it in the one table where it is a PK, to avoid inconsistent duplciate data...

答案很简单。两者兼用。对其他子表中的所有联接和FK使用简单的自动生成整数键,但要确保每个需要数据一致性的表(很少有表没有)具有备用的自然唯一键,以防止插入不一致的数据行。 ..另外,如果你总是同时使用两者,那么所有反对使用自然键的反对意见(如果它改变了怎么办?我必须改变它被引用为FK的每个地方)都没有实际意义,因为你没有使用它。 ..你只是在一个表中使用它,它是一个PK,以避免不一致的duplciate数据...

As to GUIDs, be very careful using them, as using guids in an index can hose index fragmentation. The most common algorithms used to create them puts the "random" portion of the guid in the most significant bit positions... This increases the requirement for regular index defragmentation / Reindexing as new rows are added.

对于GUID,要非常小心地使用它们,因为在索引中使用guid可以软管索引碎片。用于创建它们的最常用算法将guid的“随机”部分放在最重要的位位置......这增加了对添加新行时常规索引碎片整理/重新索引的要求。

#8


Slightly off-topic, but I feel compelled to chime in with...

稍微偏离主题,但我觉得有必要加入......

If your primary key is a GUID, do not make it a clustered index. Since GUIDs are non-sequential, the data will be re-arranged on disk during almost every insert. (Yuck.) If using GUIDs as primary keys, they should be nonclustered indexes.

如果主键是GUID,请不要使其成为聚簇索引。由于GUID是非顺序的,因此几乎每次插入时数据都将重新排列在磁盘上。 (哎呀。)如果使用GUID作为主键,它们应该是非聚簇索引。

#9


One thing you should never do is use a smart key. That is a key where information about the record is coded in the key itself, and it will eventually bite you.

你不应该做的一件事是使用智能钥匙。这是一个关键,有关记录的信息在密钥本身编码,它最终会咬你。

I worked one place, where the primary key was the account ID, which was a combination of letters and numbers. I don't remember any specifics, but, for example, those accounts that were of a certain type, would be in the 600 range, and of another type, started with 400. That was great, until that customer decided to ask for both types of work. Or changed the type of work they did.

我在一个地方工作,其中主键是帐户ID,它是字母和数字的组合。我不记得任何细节,但是,例如,那些特定类型的帐户将在600范围内,而另一种类型,从400开始。这很好,直到该客户决定要求两者工作类型。或者改变了他们所做的工作类型。

Another place, used the location in the tree as the primary key for records. So there would be records like the following.

另一个地方,使用树中的位置作为记录的主键。所以会有如下记录。

Cat1.subcatA.record1
Cat1.subcatA.record2
Cat1.subcatB.record1
Cat2.subcatA.record1

Of course, the first thing the customers wanted was a way to move items in the tree around. The entire set of software died before that happened.

当然,客户想要的第一件事就是在树中移动物品。整套软件在此之前就已经死了。

Please, please, please, if you're writing code that I ever have to maintain, please don't use a smart key!

拜托,如果您正在编写我需要维护的代码,请不要使用智能密钥!

#10


I'm a fan of the auto-increment as primary key. I know deep in my heart that this is a cop-out, but it does make it so easy to sort data by when it was added (ORDER BY ID DESC, f'r instance).

我是自动增量作为主键的粉丝。我深深地知道这是一个警察,但它确实使得在添加数据时对数据进行排序非常容易(ORDER BY ID DESC,f'r实例)。

3 columns sounds awfully harsh to humanly parse.

3列听起来非常严厉,人性化解析。

And that's the trade-off -- how much of the relational capability do you need, versus making THIS TABLE RIGHT HERE understandable to a human interrogating it (versus the stored-procedure or programmatic interface).

这就是权衡 - 你需要多少关系能力,而不是让人们在这里理解这个表格(对于存储过程或程序化界面)。

auto-increment is for us humans. :-(

自动增量是给我们人类的。 :-(

#11


Generally, it depends.

一般来说,这取决于。

Personally, I like autoincrement ints.

就个人而言,我喜欢自动增量。

But, one thing I can tell you is to never trust data from other sources as your key. I swear, every time I've done that it comes back to bite me. Well, never again!

但是,我可以告诉你的一件事是永远不要相信来自其他来源的数据作为你的关键。我发誓,每次我做完,它都会回来咬我。好吧,再也不会!

#12


There should be atleast 3 columns that make up the primary key.

应该至少有3列组成主键。

I don't understand this.

我不明白这一点。

Are you talking about a "natural key", e.g. "name and date of birth"? A natural key might be ideal if it exists, but most candidates for a natural key are either not unique (several people with the same name) or not constant (someone can change their name).

你在谈论一个“自然键”,例如“姓名和出生日期”?如果存在,自然键可能是理想的,但是自然键的大多数候选者要么不是唯一的(几个具有相同名称的人),要么不是常量(有人可以更改其名称)。

Int/ BigInt which autoincrement are good enough primary keys.

Int / BigInt哪个自动增量是足够好的主键。

I prefer Guid. A potential problem with autoincrement is that the value (e.g. "order id") is assigned by the database instance (e.g. by the "sales database") ... which won't entirely work (instead you start to need compound keys) if you ever need to merge data created by more than one database instance (e.g. from several sales offices each with their own database).

我更喜欢Guid。自动增量的一个潜在问题是值(例如“订单ID”)由数据库实例(例如“销售数据库”)分配......这不会完全起作用(相反,您开始需要复合键)您需要合并由多个数据库实例创建的数据(例如,来自多个销售办事处,每个数据库实例都有自己的数据库)。

#13


RE GUID's

Watch out if this is going to be a really Really REALLY REALLY big database, lots of load, and fast access.

注意这是否真的是一个真正非常棒的大数据库,大量的负载和快速访问。

At my last job, where we had databases of 100 to 500 million records, our database guys strongly argued against GUIDs, and for an appropriately sized decimal number. They felt that (under Oracle) the size difference in the internal storage for a string Guid - vs- a decimal value would make a very noticeable difference in lookups. ( Bigger keys = deeper trees to traverse)

在我上一份工作中,我们拥有1亿到5亿条记录的数据库,我们的数据库人员强烈反对GUID,以及适当大小的十进制数。他们认为(在Oracle下)字符串内部存储的大小差异Guid - vs- a十进制值会在查找中产生非常显着的差异。 (更大的键=更深的树遍历)

The random nature of GUIDs also reduces the fill-factor for index pages significantly - this dramatically increases tearing and disk I/O.

GUID的随机性也会显着降低索引页的填充因子 - 这会大大增加撕裂和磁盘I / O.

#14


Auto increment columns. I am able to make my code work seamlessly with SQL Server or Oracle, one using identity the other using sequences through my DAL, and I couldn't be happier. I agree, GUIDs sometimes are necessary if you are doing replication or sending data away to receive it later on afer processing.

自动增量列。我能够使我的代码与SQL Server或Oracle无缝协作,一个使用身份,另一个使用序列通过我的DAL,我不能更快乐。我同意,如果您正在进行复制或发送数据以便稍后在处理时接收它,则GUID有时是必要的。

#15


I've always used a surrogate key - an autoincrementing integer called 'id'. I can see plenty of reasons to do this even when another option is obvious:

我一直使用代理键 - 一个称为'id'的自动增量整数。即使另一个选项很明显,我也可以看到很多理由:

  • Consistency
  • Data independent (unique, not destroyed by changes to format)
  • 数据独立(唯一,不会因格式更改而被破坏)

  • Human-readable

...and no sensible reason not to:

......没有明智的理由不:

  • Ambiguity in joins? - Aliasing tables is a better practice, IMHO
  • 连接中的歧义? - 混淆表是一种更好的做法,恕我直言

  • Optimum tables? - Removing one byte per entry is premature optimisation, IMHO
  • 最佳表? - 每个条目删除一个字节是不成熟的优化,恕我直言

  • Per-table decision? - No longer consistent
  • 每桌决定? - 不再一致

  • Scaling problems? - Eh? Why?
  • 缩放问题? - 呃?为什么?

  • Hierarchical data structure? - That's denormalising, a whole other subject of religion. Suffice it to say I'm a fan in a few circumstances in theory, but never in practice :)
  • 分层数据结构? - 这是非正规化,是宗教的另一个主题。我只想说我在理论上的一些情况下是粉丝,但从未在实践中:)

sensible reasons against that I haven't thought of or come across yet are always welcomed...

我没有想到或遇到的明显理由仍然受到欢迎......

#16


This is a classic "it depends". There's no one right answer for every project. I like different things for different situations. It depends on whether I'm using an ORM and what it supports. It depends on the overall architecture (distributed or not, etc). Just pick one that you think will work and move on to arguing over tabs and spaces.

这是一个经典的“它取决于”。每个项目都没有正确答案。我喜欢不同的情况。这取决于我是否使用ORM以及它支持的内容。它取决于整体架构(分布式或非分布式)等。只需选择一个您认为可行的方法,然后继续争论标签和空格。

#17


I tend to use option #1 or #3 depending on the size, the number of people connecting, and whether it is a multiple database server situation or not.

我倾向于使用选项#1或#3,具体取决于大小,连接人数以及是否是多数据库服务器情况。

Option #2 doesn't make much sense to me. If any one of the three is not enough to identify a unique record, then it's possible (without going through extra machinations) two have two records show up with the same values in all three columns. If you want to enforce uniqueness on any combination of the three, then just add an index for them.

选项#2对我来说没什么意义。如果三者中的任何一个不足以识别唯一记录,则可能(不经过额外的阴谋)两个记录在所有三列中显示具有相同值的两个记录。如果要对三者的任意组合强制实施唯一性,则只需为它们添加索引即可。

#18


I've only use an auto-increment int or a GUID. 99% of the time I've use auto-increment int. It's just what I was taught to use when I first learned about databases and have never run into a reason not to use them (although I know of reasons why a GUID would be better).

我只使用自动增量int或GUID。 99%的时间我使用自动增量int。这就是我第一次学习数据库时所学到的东西,并且从未遇到过不使用它们的原因(虽然我知道为什么GUID会更好)。

I like auto increment ints because it helps with readability. For example I can say "take a look at record 129383" and it's pretty easy for someone to go in and find it. With a GUID that's nearly impossible to do.

我喜欢自动增量int,因为它有助于提高可读性。例如,我可以说“看看记录129383”,并且很容易让某人进去找到它。使用GUID几乎不可能做到。

#19


Past a basic definitional answer, what constitutes a good primary key is left largely to religion and break room arguments. If you have something that is, and will always, map uniquely to an individual row, then it will work fine as a primary key. Past that point, there are other considerations:

过去一个基本的定义答案,什么构成一个好的主键主要是宗教和打破房间的论点。如果您拥有的东西,并且将始终唯一地映射到单个行,那么它将作为主键正常工作。过去,还有其他一些考虑因素:

  • Is the primary key definition not overly complex? Does it avoid introducing unnecessary complexity for the sake of following a "best-practice"?
  • 主键定义是不是过于复杂?是否为了遵循“最佳实践”而避免引入不必要的复杂性?

  • Is there a better possible primary key that would require less overhead for the database to handle (i.e. INTEGER vs. VARCHAR, etc)?
  • 是否有更好的主键需要更少的数据库处理开销(即INTEGER与VARCHAR等)?

  • Am I ABSOLUTELY certain that the uniqueness and defined-ness invariant of my primary key will not change?
  • 我绝对肯定我的主键的唯一性和定义不变量不会改变吗?

This last one is likely what draws most people to use things like GUIDs or self-incrementing integer columns, because relying on things like addresses, phone numbers, first/last names, etc, just don't cut it. The only invariant about people I can think of is SSNs, but then I'm not even 100% certain about those remaining forever unique.

最后一个可能是吸引大多数人使用诸如GUID或自增量整数列之类的东西,因为依赖于诸如地址,电话号码,名字/姓氏之类的东西,只是不要削减它。关于我能想到的人的唯一不变量是SSN,但是我甚至不能100%肯定那些永远独特的人。

Hopefully this helps add some clarity...

希望这有助于增加一些清晰度......

#20


The way I approach primary keys (and I feel is the best) is to avoid having a "default" approach. This means instead of just slapping on an auto-incrementing integer and calling it a day I look at the problem and say "is there a column or group of columns that will always be unqiue and won't change?" If the answer is yes then I take that approach.

我接近主键的方式(我觉得最好)是避免使用“默认”方法。这意味着不是只是单击一个自动递增的整数并调用它一天,我会查看问题并说“是否有一列或一组列始终是unqiue且不会更改?”如果答案是肯定的,那么我采取这种方法。

#21


Almost always integers.

几乎总是整数。

They have other good reasons besides being smaller/faster to process. Which would you rather write down - "404040" or "3463b5a2-a02b-4fd4-aa0f-1d3c0450026c"?

除了更小/更快的处理之外,它们还有其他好的理由。你宁愿写下哪一个 - “404040”或“3463b5a2-a02b-4fd4-aa0f-1d3c0450026c”?

#22


Only slightly relevant, but one thing I've started doing recently when I have small classification tables (essentially those that would represent ENUMs in code) is that I'll make the primary key a char(3) or char(4). Then I make those primary keys representative of the lookup value.

只是略微相关,但有一件事我最近开始做的时候我有小分类表(基本上代表代码中的ENUM)是我将主键设为char(3)或char(4)。然后我创建代表查找值的主键。

For example, I have a quoting system for our internal Sales Agents. We have "Cost Categories" that every quote line item is assigned one of... So I have a type lookup table called 'tCostCategories', where primary key is 'MTL', 'SVC', 'TRV', 'TAX', 'ODC'. Other columns in the lookup table store more details, such as the normal english meanings of the codes, "Material", "Service", "Travel", "Taxes", "Other Direct Costs", and so forth.

例如,我有一个内部销售代理的报价系统。我们有“成本类别”,每个报价行项目都分配了一个...所以我有一个名为'tCostCategories'的类型查找表,其中主键是'MTL','SVC','TRV','TAX', 'ODC'。查找表中的其他列存储更多详细信息,例如代码的正常英语含义,“材料”,“服务”,“旅行”,“税收”,“其他直接成本”等。

This is really nice because it doesn't use any more space than an int, and when you are looking at the source data, you don't have to link the lookup table to know what the heck the value is. For example, a quote row might look like:

这非常好,因为它不使用任何空间而不是int,当您查看源数据时,您不必链接查找表以了解该值是什么。例如,引用行可能如下所示:

1 PartNumber $40 MTL
2 OtherPartNumber $29.99 SVC
3 PartNumber2 $150 TRV

1 PartNumber $ 40 MTL 2 OtherPartNumber $ 29.99 SVC 3 PartNumber2 $ 150 TRV

It's much easier that using an int to represent the categories and then linking 1, 2, 3 on all the lines - you have the data right there in front of you, and the performance doesn't seem affected at all (not that I've truly tested.)

使用int来表示类别然后在所有行上链接1,2,3要容易得多 - 你的数据就在你面前,并且性能似乎根本没有受到影响(不是我'真正经过考验。)

As far as the real question goes... I like RowGUID uniqueidentifiers. I'm not 100% on this, but don't all rows have internal RowGuid's anyway?? If so, then using the RowGuid would actually take less space than ints (or anything else for that matter.) All I know is that if it's good enough for M$ to use in GreatPlains then it's good enough for me. (Should I duck??)

就真正的问题而言......我喜欢RowGUID uniqueidentifiers。我不是百分之百,但不是所有的行都有内部的RowGuid吗?如果是这样,那么使用RowGuid实际上会占用比int更少的空间(或者其他任何东西。)我所知道的是,如果它足以让M $在GreatPlains中使用那么它对我来说已经足够了。 (我应该躲?)

#23


Oh one more reason I use GUIDs - I use a hierarchical data structure. That is, I have a table 'Company' and a table 'Vendor' for which the Primary Keys match up. But I also have a table 'Manufacturer' that also 'inherits' from Company. The fields that are common to Vendors and Manufacturers don't appear in those tables - they appear in Company. In this setup, using int's is much more painful than Guids. In the very least, you can't use identity primary keys.

哦,我使用GUID的另一个原因 - 我使用分层数据结构。也就是说,我有一个表'Company'和一个表'Vendor',其中主键匹配。但我也有一个'制造商'表,也'继承'公司。供应商和制造商共有的字段不会出现在这些表中 - 它们出现在公司中。在这个设置中,使用int比Guids更痛苦。至少,您不能使用身份主键。

#24


I like natural keys, whenever I can trust them. I'm willing to pay a small performance price price in order to use keys that make sense to the subject matter experts.

每当我信任他们时,我都喜欢自然键。我愿意支付一个小的性价格,以便使用对主题专家有意义的密钥。

For tables that describe entities, there should be a simple natural key that identifies individual instances the same way the subject matter people do. If the subject matter does not have trustworthy identifiers for one of the entities, then I'll resort to a surrogate key.

对于描述实体的表,应该有一个简单的自然键,以与主题人员相同的方式识别各个实例。如果主题没有其中一个实体的可信标识符,那么我将使用代理键。

For tables that describe relationships, I use a compound key, where each component references an entity that participates in the relationship, and therefore a row in an entity table. Again, the performance hit for using a compound key is generally minimal.

对于描述关系的表,我使用复合键,其中每个组件引用参与关系的实体,因此引用实体表中的行。同样,使用复合键的性能损失通常很小。

As others have pointed out, the term "primary key" is a little misleading. In the Relational Data Model, the term that's used is "candidate keys". There could be several candidate keys for a single table. Logically, each one is just as good as another. Choosing one of them as "primary" and making all references via that key is simply a choice the designer can make.

正如其他人所指出的那样,“主键”一词有点误导。在关系数据模型中,使用的术语是“候选键”。单个表可能有几个候选键。从逻辑上讲,每一个都和另一个一样好。选择其中一个作为“主要”并通过该键进行所有引用只是设计师可以做出的选择。

#25


Guids.period.

In the event that you need to scale out or you need to assign the primary key by alternate means they will be your friend. You can add indexes for everything else.

如果您需要扩展或需要通过其他方式分配主键,他们将成为您的朋友。您可以为其他所有内容添加索引。


update to clarify my statement.

更新以澄清我的陈述。

I've worked on a lot of different kinds of sites. From small single server deals to large ones backed with multiple DB and web servers. There have certainly been apps that would have been just fine with auto incrementing ints as primary keys. However, those don't fit the model of how I do things.

我曾经在很多不同类型的网站上工作过。从小型单服务器交易到支持多个数据库和Web服务器的大型服务器。肯定有一些应用程序可以自动增加整数作为主键。然而,那些不符合我如何做事的模型。

When using a GUID you can generate the ID anywhere. It could be generated by a remote server, your web app, within the database itself or even within multiple databases in a multimaster situation.

使用GUID时,您可以在任何地方生成ID。它可以由远程服务器,您的Web应用程序,在数据库本身内生成,甚至可以在多主机情况下的多个数据库中生成。

On the other hand, an auto incremented INT can only be safely generated within the primary database. Again, this might be okay if you have an application that will be intimately tied to that one backing DB server and scaling out is not something you are concerned with.

另一方面,只能在主数据库中安全地生成自动递增的INT。同样,如果您的应用程序与该备份数据库服务器密切相关,并且扩展不是您关心的问题,那么这可能没问题。

Sure, usage of GUIDs mean you have to have nightly reindexing processes. However, if you are using anything other than an auto incremented INT you should do that anyway. Heck, even with an INT as the primary it's likely you have other indexes that need regenerated to deal with fragmentation. Therefore, using GUIDs doesn't exactly add another problem because those tasks need to be performed regardless.

当然,使用GUID意味着您必须每晚重建索引过程。但是,如果您使用的是除自动增量INT之外的任何其他内容,则无论如何都应该这样做。哎呀,即使将INT作为主要内容,您可能还需要重新生成其他索引来处理碎片。因此,使用GUID并不会完全添加另一个问题,因为无论如何都需要执行这些任务。

If you take a look at the larger apps out there you will notice something important: they all use Base64 encoded GUIDs as the keys. The reason for this is simple, usage of GUIDs enables you to scale out easily whereas there can be a lot of hoops to jump through when attempting to scale out INTs.

如果您看一下较大的应用程序,您会发现一些重要的事情:它们都使用Base64编码的GUID作为密钥。原因很简单,GUID的使用使您可以轻松地扩展,而在尝试扩展INT时可能会有很多跳跃。

Our latest app goes through a period of heavy inserts that lasts for about a month. After that 90+% of the queries are all selects for reporting. To increase capacity I can bring up additional DB servers during this large insert period; and later easily merge those into a single DB for reporting. Attempting to do that with INTs would be an absolute nightmare.

我们最新的应用程序经历了一段时间的重插入,持续了大约一个月。之后,90%以上的查询都是报告选择。为了增加容量,我可以在这个大插入期间启动额外的数据库服务器;然后很容易将它们合并到一个DB中进行报告。试图用INTs做这件事绝对是一场噩梦。

Quite frankly, any time you cluster a database or setup replication the DB server is going to demand that you have GUIDs on the table anyway. So, if you think that your system might need to grow then pick the one that's good.

坦率地说,无论何时集群数据库或设置复制,数据库服务器都会要求您在表上拥有GUID。所以,如果你认为你的系统可能需要增长,那么选择一个好的系统。

#26


This is a complex subject whether you realized it or not. Might fall under the section on this * FAQ.

无论你是否意识到这一点,这都是一个复杂的主题。可能属于*常见问题解答部分。

What kind of questions should I not ask here?

我不应该在这里问什么样的问题?

Avoid asking questions that are subjective, argumentative, or require extended discussion. This is a place for questions that can be answered!

避免提出主观,议论或需要进行深入讨论的问题。这是一个可以回答问题的地方!

This has been debated for years and will continue to be debated for years. The only hints of consensus I have seen is that the answers are somewhat predictable depending on if you are asking a OO guy (GUIDs are the only way to go!), a data modeler (Natural keys are the only way to go!), or a performance oriented DBA (INTs are the only way to go!).

多年来一直争论不休,并将继续争论多年。我见过的唯一一致的暗示是,答案在某种程度上是可以预测的,这取决于你是否要求OO人(GUID是唯一的方法!),数据建模者(自然键是唯一的方法!),或者以性能为导向的DBA(INT是唯一的方法!)。