唯一标识符(guid)作为数据库设计中的主键

时间:2021-01-30 11:11:03

Our data resides in a SQL Server 2008 database, there will be a lot queries and joinings between tables. We have this argument inside the team, some are arguing use of integer identity is better for performance, some are arguing use of guid (unique identifier).

我们的数据驻留在SQL Server 2008数据库中,表之间会有很多查询和连接。我们在团队内部有这个论点,有些人认为使用整数身份对性能更好,有些人则主张使用guid(唯一标识符)。

Does the performance really suffer that badly using a GUID as a primary key?

使用GUID作为主键,性能是否真的遭受了严重影响?

6 个解决方案

#1


31  

A 128-bit GUID (uniqueidentifier) key is of course 4x larger than a 32-bit int key. However, there are a few key advantages:

128位GUID(uniqueidentifier)键当然比32位int键大4倍。但是,有一些关键优势:

  • No "IDENTITY INSERT" issue when merging content
  • 合并内容时没有“IDENTITY INSERT”问题
  • If you use a COMB value instead of NEWSEQUENTIALID(), you get a "free" INSERT timestamp. You can even SELECT from the primary key based on a date/time range if you want with a few fancy CAST() calls.
  • 如果使用COMB值而不是NEWSEQUENTIALID(),则会获得“免费”INSERT时间戳。如果您想要一些花哨的CAST()调用,您甚至可以根据日期/时间范围从主键中进行SELECT。
  • They are globally unique, which turns out to be pretty handy now and then.
  • 它们是全球独一无二的,偶尔会变得非常方便。
  • Since there's no need to track high-water marks, your BL layer can assign the value rather than SQL Server, thus eliminating the step of SELECT scope_identity() to get the primary key after an insert.
  • 由于不需要跟踪高水位线,因此BL层可以分配值而不是SQL Server,从而消除了SELECT scope_identity()在插入后获取主键的步骤。
  • If it's even remotely possible that you could have more than 2 billion records, you'll need to use bigint (64 bits) instead of int. Once you do that, uniqueidentifier is only twice as big as a bigint.
  • 如果你甚至可能拥有超过20亿条记录,那么你需要使用bigint(64位)而不是int。一旦你这样做,uniqueidentifier只有bigint的两倍大。
  • Using GUIDs makes it safer to expose keys in URLs, etc. without exposing yourself to "guess-the-ID" attacks.
  • 使用GUID可以更安全地在URL等中公开密钥,而不会让自己暴露于“猜测ID”攻击。
  • Between how SQL Server loads pages from disk and how processors are now mostly 64-bit, just because a number is 128 bits instead of 32 doesn't mean it takes 4x longer to compare. The last test I saw showed that GUIDs are nearly as fast.
  • 在SQL Server如何从磁盘加载页面以及处理器现在主要是64位的方式之间,仅仅因为数字是128位而不是32位并不意味着比较需要4倍的时间。我看到的最后一个测试显示GUID几乎一样快。
  • Index size depends on how many columns are included. Even though the GUIDs themselves are larger, the extra 8 or 12 bytes may be insignificant compared to the other columns in the index.
  • 索引大小取决于包含的列数。尽管GUID本身较大,但与索引中的其他列相比,额外的8或12个字节可能无关紧要。

In the end, squeezing out some small performance advantage by using integers may not be worth losing the advantages of a GUID. Test it empirically and decide for yourself.

最后,通过使用整数来挤出一些小的性能优势可能不值得失去GUID的优点。根据经验进行测试并自行决定。

Personally, I still use both, depending on the situation, but the deciding factor has never really come down to performance in my case.

就个人而言,我仍然根据具体情况使用两者,但在我的情况下,决定因素从来没有真正归结为性能。

#2


20  

I personally use INT IDENTITY for most of my primary and clustering keys.

我个人使用INT IDENTITY来处理我的大多数主键和集群键。

You need to keep apart the primary key which is a logical construct - it uniquely identifies your rows, it has to be unique and stable and NOT NULL. A GUID works well for a primary key, too - since it's guaranteed to be unique. A GUID as your primary key is a good choice if you use SQL Server replication, since in that case, you need an uniquely identifying GUID column anyway.

您需要将作为逻辑构造的主键分开 - 它唯一地标识您的行,它必须是唯一且稳定且NOT NULL。 GUID也适用于主键 - 因为它保证是唯一的。如果使用SQL Server复制,GUID作为主键是一个不错的选择,因为在这种情况下,无论如何都需要唯一标识的GUID列。

The clustering key in SQL Server is a physical construct is used for the physical ordering of the data, and is a lot more difficult to get right. Typically, the Queen of Indexing on SQL Server, Kimberly Tripp, also requires a good clustering key to be uniqe, stable, as narrow as possible, and ideally ever-increasing (all of which a INT IDENTITY is).

SQL Server中的聚类键是一个物理构造,用于数据的物理排序,并且更难以正确。通常,SQL Server上的索引女王Kimberly Tripp也需要一个好的集群密钥,它是唯一的,稳定的,尽可能窄的,并且理想情况下不断增加(所有这些都是INT IDENTITY)。

See her articles on indexing here:

在这里查看她关于索引的文章:

and also see Jimmy Nilsson's The Cost of GUIDs as Primary Key

并且还将Jimmy Nilsson的GUID成本视为主要关键

A GUID is a horribly bad choice for a clustering key, since it's wide, totally random, and thus leads to bad index fragmentation and poor performance. Also, the clustering key row(s) is also stored in each and every entry of each and every non-clustered (additional) index, so you really want to keep it small - GUID is 16 byte vs. INT is 4 byte, and with several non-clustered indices and several million rows, this makes a HUGE difference.

对于聚类键,GUID是一个非常糟糕的选择,因为它很宽,完全随机,因此导致错误的索引碎片和糟糕的性能。此外,群集密钥行也存储在每个非群集(附加)索引的每个条目中,因此您确实希望保持较小 - GUID为16字节而INT为4字节,并且有几个非聚集索引和几百万行,这是一个巨大的差异。

In SQL Server, your primary key is by default your clustering key - but it doesn't have to be. You can easily use a GUID as your NON-Clustered primary key, and an INT IDENTITY as your clustering key - it just takes a bit of being aware of it.

在SQL Server中,您的主键默认情况下是您的群集密钥 - 但它不一定是。您可以轻松地将GUID用作NON-Clustered主键,并使用INT IDENTITY作为您的群集密钥 - 只需要了解一点。

#3


4  

Great article on this that I have in my bookmarks: http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx

我在书签中有这篇文章:http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx

#4


3  

The big problem with GUIDs as primary keys is that they cause massive table fragmentation, which can be a big performance issue (the larger the table, the larger the issue). Even as a key for a nonclustered index, they will cause index fragmentation.

GUID作为主键的一个大问题是它们会导致大量的表碎片,这可能是一个很大的性能问题(表越大,问题越大)。即使作为非聚簇索引的键,它们也会导致索引碎片化。

You can partly mitigate the problem by setting an appropriate fill factor -- but it will still be an issue.

您可以通过设置适当的填充因子来部分缓解问题 - 但它仍然是一个问题。

The size difference doesn't bother me that much, except on tables with otherwise narrow rows where table scans are also required. In those cases, being able to fit more rows per DB page is a performance advantage.

除了在需要进行表扫描的其他窄行的表上,大小差异并没有那么多。在这些情况下,每个数据库页面能够容纳更多行是一个性能优势。

There can be good reasons to use GUIDs, but there is also a cost. I generally prefer INT IDENTITY for primary keys, but I don't avoid GUIDs when they are a better solution.

使用GUID有充分的理由,但也有成本。我通常更喜欢INT IDENTITY用于主键,但是当它们是更好的解决方案时我不会避免使用GUID。

#5


0  

The major advantage of using GUIDs is that they are unique across all space and time.

使用GUID的主要优点是它们在所有空间和时间都是唯一的。

The main disadvantage to using GUIDs as key values is that they are BIG. At 16 bytes a pop, they are one of the largest datatypes in SQL Server. Indexes built on GUIDs are going to be larger and slower than indexes built on IDENTITY columns, which are usually ints (4 bytes).

使用GUID作为关键值的主要缺点是它们很大。弹出的是16个字节,它们是SQL Server中最大的数据类型之一。构建在GUID上的索引将比构建在IDENTITY列上的索引更大更慢,这些列通常是整数(4个字节)。

So they are a good solution for the cases where you need to merge data from several sources

因此,对于需要合并来自多个来源的数据的情况,它们是一个很好的解决方案

Source : http://www.sqlteam.com/article/uniqueidentifier-vs-identity

资料来源:http://www.sqlteam.com/article/uniqueidentifier-vs-identity

#6


-1  

If database table records can grow into million records, I think it is not a good idea to use it as a primary key.

如果数据库表记录可以增长到百万条记录,我认为将它用作主键并不是一个好主意。

#1


31  

A 128-bit GUID (uniqueidentifier) key is of course 4x larger than a 32-bit int key. However, there are a few key advantages:

128位GUID(uniqueidentifier)键当然比32位int键大4倍。但是,有一些关键优势:

  • No "IDENTITY INSERT" issue when merging content
  • 合并内容时没有“IDENTITY INSERT”问题
  • If you use a COMB value instead of NEWSEQUENTIALID(), you get a "free" INSERT timestamp. You can even SELECT from the primary key based on a date/time range if you want with a few fancy CAST() calls.
  • 如果使用COMB值而不是NEWSEQUENTIALID(),则会获得“免费”INSERT时间戳。如果您想要一些花哨的CAST()调用,您甚至可以根据日期/时间范围从主键中进行SELECT。
  • They are globally unique, which turns out to be pretty handy now and then.
  • 它们是全球独一无二的,偶尔会变得非常方便。
  • Since there's no need to track high-water marks, your BL layer can assign the value rather than SQL Server, thus eliminating the step of SELECT scope_identity() to get the primary key after an insert.
  • 由于不需要跟踪高水位线,因此BL层可以分配值而不是SQL Server,从而消除了SELECT scope_identity()在插入后获取主键的步骤。
  • If it's even remotely possible that you could have more than 2 billion records, you'll need to use bigint (64 bits) instead of int. Once you do that, uniqueidentifier is only twice as big as a bigint.
  • 如果你甚至可能拥有超过20亿条记录,那么你需要使用bigint(64位)而不是int。一旦你这样做,uniqueidentifier只有bigint的两倍大。
  • Using GUIDs makes it safer to expose keys in URLs, etc. without exposing yourself to "guess-the-ID" attacks.
  • 使用GUID可以更安全地在URL等中公开密钥,而不会让自己暴露于“猜测ID”攻击。
  • Between how SQL Server loads pages from disk and how processors are now mostly 64-bit, just because a number is 128 bits instead of 32 doesn't mean it takes 4x longer to compare. The last test I saw showed that GUIDs are nearly as fast.
  • 在SQL Server如何从磁盘加载页面以及处理器现在主要是64位的方式之间,仅仅因为数字是128位而不是32位并不意味着比较需要4倍的时间。我看到的最后一个测试显示GUID几乎一样快。
  • Index size depends on how many columns are included. Even though the GUIDs themselves are larger, the extra 8 or 12 bytes may be insignificant compared to the other columns in the index.
  • 索引大小取决于包含的列数。尽管GUID本身较大,但与索引中的其他列相比,额外的8或12个字节可能无关紧要。

In the end, squeezing out some small performance advantage by using integers may not be worth losing the advantages of a GUID. Test it empirically and decide for yourself.

最后,通过使用整数来挤出一些小的性能优势可能不值得失去GUID的优点。根据经验进行测试并自行决定。

Personally, I still use both, depending on the situation, but the deciding factor has never really come down to performance in my case.

就个人而言,我仍然根据具体情况使用两者,但在我的情况下,决定因素从来没有真正归结为性能。

#2


20  

I personally use INT IDENTITY for most of my primary and clustering keys.

我个人使用INT IDENTITY来处理我的大多数主键和集群键。

You need to keep apart the primary key which is a logical construct - it uniquely identifies your rows, it has to be unique and stable and NOT NULL. A GUID works well for a primary key, too - since it's guaranteed to be unique. A GUID as your primary key is a good choice if you use SQL Server replication, since in that case, you need an uniquely identifying GUID column anyway.

您需要将作为逻辑构造的主键分开 - 它唯一地标识您的行,它必须是唯一且稳定且NOT NULL。 GUID也适用于主键 - 因为它保证是唯一的。如果使用SQL Server复制,GUID作为主键是一个不错的选择,因为在这种情况下,无论如何都需要唯一标识的GUID列。

The clustering key in SQL Server is a physical construct is used for the physical ordering of the data, and is a lot more difficult to get right. Typically, the Queen of Indexing on SQL Server, Kimberly Tripp, also requires a good clustering key to be uniqe, stable, as narrow as possible, and ideally ever-increasing (all of which a INT IDENTITY is).

SQL Server中的聚类键是一个物理构造,用于数据的物理排序,并且更难以正确。通常,SQL Server上的索引女王Kimberly Tripp也需要一个好的集群密钥,它是唯一的,稳定的,尽可能窄的,并且理想情况下不断增加(所有这些都是INT IDENTITY)。

See her articles on indexing here:

在这里查看她关于索引的文章:

and also see Jimmy Nilsson's The Cost of GUIDs as Primary Key

并且还将Jimmy Nilsson的GUID成本视为主要关键

A GUID is a horribly bad choice for a clustering key, since it's wide, totally random, and thus leads to bad index fragmentation and poor performance. Also, the clustering key row(s) is also stored in each and every entry of each and every non-clustered (additional) index, so you really want to keep it small - GUID is 16 byte vs. INT is 4 byte, and with several non-clustered indices and several million rows, this makes a HUGE difference.

对于聚类键,GUID是一个非常糟糕的选择,因为它很宽,完全随机,因此导致错误的索引碎片和糟糕的性能。此外,群集密钥行也存储在每个非群集(附加)索引的每个条目中,因此您确实希望保持较小 - GUID为16字节而INT为4字节,并且有几个非聚集索引和几百万行,这是一个巨大的差异。

In SQL Server, your primary key is by default your clustering key - but it doesn't have to be. You can easily use a GUID as your NON-Clustered primary key, and an INT IDENTITY as your clustering key - it just takes a bit of being aware of it.

在SQL Server中,您的主键默认情况下是您的群集密钥 - 但它不一定是。您可以轻松地将GUID用作NON-Clustered主键,并使用INT IDENTITY作为您的群集密钥 - 只需要了解一点。

#3


4  

Great article on this that I have in my bookmarks: http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx

我在书签中有这篇文章:http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx

#4


3  

The big problem with GUIDs as primary keys is that they cause massive table fragmentation, which can be a big performance issue (the larger the table, the larger the issue). Even as a key for a nonclustered index, they will cause index fragmentation.

GUID作为主键的一个大问题是它们会导致大量的表碎片,这可能是一个很大的性能问题(表越大,问题越大)。即使作为非聚簇索引的键,它们也会导致索引碎片化。

You can partly mitigate the problem by setting an appropriate fill factor -- but it will still be an issue.

您可以通过设置适当的填充因子来部分缓解问题 - 但它仍然是一个问题。

The size difference doesn't bother me that much, except on tables with otherwise narrow rows where table scans are also required. In those cases, being able to fit more rows per DB page is a performance advantage.

除了在需要进行表扫描的其他窄行的表上,大小差异并没有那么多。在这些情况下,每个数据库页面能够容纳更多行是一个性能优势。

There can be good reasons to use GUIDs, but there is also a cost. I generally prefer INT IDENTITY for primary keys, but I don't avoid GUIDs when they are a better solution.

使用GUID有充分的理由,但也有成本。我通常更喜欢INT IDENTITY用于主键,但是当它们是更好的解决方案时我不会避免使用GUID。

#5


0  

The major advantage of using GUIDs is that they are unique across all space and time.

使用GUID的主要优点是它们在所有空间和时间都是唯一的。

The main disadvantage to using GUIDs as key values is that they are BIG. At 16 bytes a pop, they are one of the largest datatypes in SQL Server. Indexes built on GUIDs are going to be larger and slower than indexes built on IDENTITY columns, which are usually ints (4 bytes).

使用GUID作为关键值的主要缺点是它们很大。弹出的是16个字节,它们是SQL Server中最大的数据类型之一。构建在GUID上的索引将比构建在IDENTITY列上的索引更大更慢,这些列通常是整数(4个字节)。

So they are a good solution for the cases where you need to merge data from several sources

因此,对于需要合并来自多个来源的数据的情况,它们是一个很好的解决方案

Source : http://www.sqlteam.com/article/uniqueidentifier-vs-identity

资料来源:http://www.sqlteam.com/article/uniqueidentifier-vs-identity

#6


-1  

If database table records can grow into million records, I think it is not a good idea to use it as a primary key.

如果数据库表记录可以增长到百万条记录,我认为将它用作主键并不是一个好主意。