使用GUID作为主键(特别是关于性能)的最佳实践是什么?

时间:2021-10-01 20:08:35

I have an application that uses GUID as the Primary Key in almost all tables and I have read that there are issues about performance when using GUID as Primary Key. Honestly, I haven't seen any problem, but I'm about to start a new application and I still want to use the GUIDs as the Primary Keys, but I was thinking of using a Composite Primary Key (The GUID and maybe another field.)

我有一个在几乎所有表中使用GUID作为主键的应用程序,我读到在使用GUID作为主键时存在性能问题。老实说,我还没有发现任何问题,但我即将启动一个新的应用程序,我仍然希望使用GUIDs作为主键,但我正在考虑使用复合主键(GUID,可能还有其他字段)。

I'm using a GUID because they are nice and easy to manage when you have different environments such as "production", "test" and "dev" databases, and also for migration data between databases.

我之所以使用GUID,是因为当您有不同的环境(如“生产”、“测试”和“开发”数据库,以及数据库之间的迁移数据)时,GUID很容易管理。

I will use Entity Framework 4.3 and I want to assign the Guid in the application code, before inserting it in the database. (i.e. I don't want to let SQL generate the Guid).

在将Guid插入数据库之前,我将使用Entity Framework 4.3,并在应用程序代码中分配Guid。(也就是说,我不想让SQL生成Guid)。

What is the best practice for creating GUID-based Primary Keys, in order to avoid the supposed performance hits associated with this approach?

创建基于指导的主键的最佳实践是什么,以避免与此方法相关的假定性能影响?

5 个解决方案

#1


387  

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

对于主键来说,GUIDs似乎是很自然的选择——如果您确实必须使用它,您可能会争论使用它作为表的主键。我强烈建议不要使用GUID列作为集群键,这是SQL Server默认的做法,除非您明确告诉它不要这样做。

You really need to keep two issues apart:

你真的需要把两个问题分开:

  1. the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

    主键是一个逻辑构造—候选键之一,它惟一而可靠地标识表中的每一行。这可以是任何东西,真的-一个INT, GUID,一个string -选择对你的场景最有意义的。

  2. the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

    集群键(定义表上“集群索引”的列或列)——这是与物理存储相关的东西,在这里,一个小的、稳定的、不断增加的数据类型是您的最佳选择——INT或BIGINT作为默认选项。

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

默认情况下,SQL Server表上的主键也被用作集群键——但不需要这样做!我个人认为,将以前的基于向导的主/聚集键分解为两个独立键——GUID上的主(逻辑)键,以及单独的INT IDENTITY(1,1)列上的聚类(排序)键,可以获得巨大的性能提升。

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

正如索引女王金伯利•特里普(Kimberly Tripp)和其他人多次指出的那样,GUID作为聚类键并不是最优的,因为由于它的随机性,它将导致大量的页面和索引碎片,并导致通常较差的性能。

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.

是的,我知道—SQL Server 2005和之上都有newsequentialid()—但是即使这样也不是真正的、完全的顺序的,因此也会遇到与GUID相同的问题—只是稍微不那么明显。

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

然后还要考虑另一个问题:表上的集群键将被添加到表上每个和每个非集群索引上的每个条目中——因此您确实希望确保它尽可能小。通常,拥有20多亿行的INT类型应该足以满足大多数表的需求——与GUID作为集群密钥相比,您可以在磁盘和服务器内存中节省数百兆字节的存储空间。

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

快速计算——使用INT / GUID作为主键和聚类键:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
  • 具有1,000 '000'行的基表(3.8 MB vs 15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)
  • 6个非聚集索引(22.89 MB vs 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

总计:25 MB和106 MB -这仅仅是在一个表上!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

还有一些值得思考的食物——金伯利·特里普(Kimberly Tripp)的杰作——读一读,再读一遍,把它消化掉!这是SQL Server索引福音。

PS: of course, if you're dealing with just a few hundred or a few thousand rows - most of these arguments won't really have much of an impact on you. However: if you get into the tens or hundreds of thousands of rows, or you start counting in millions - then those points become very crucial and very important to understand.

PS:当然,如果你只处理几百行或几千行,这些观点对你的影响并不大。然而:如果你进入数以万计的行,或者你开始以百万计计数,那么这些点就变得非常重要和非常重要。

Update: if you want to have your PKGUID column as your primary key (but not your clustering key), and another column MYINT (INT IDENTITY) as your clustering key - use this:

更新:如果你想让你的PKGUID列作为你的主键(但不是你的集群键),而另一个列MYINT (INT IDENTITY)作为你的集群键——使用以下方法:

CREATE TABLE dbo.MyTable
(PKGUID UNIQUEIDENTIFIER NOT NULL,
 MyINT INT IDENTITY(1,1) NOT NULL,
 .... add more columns as needed ...... )

ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY NONCLUSTERED (PKGUID)

CREATE UNIQUE CLUSTERED INDEX CIX_MyTable ON dbo.MyTable(MyINT)

Basically: you just have to explicitly tell the PRIMARY KEY constraint that it's NONCLUSTERED (otherwise it's created as your clustered index, by default) - and then you create a second index that's defined as CLUSTERED

基本上:您只需显式地告诉主键约束它是非群集的(否则它被默认地创建为您的群集索引)——然后创建第二个定义为群集的索引

This will work - and it's a valid option if you have an existing system that needs to be "re-engineered" for performance. For a new system, if you start from scratch, and you're not in a replication scenario, then I'd always pick ID INT IDENTITY(1,1) as my clustered primary key - much more efficient than anything else!

这将会起作用——如果您有一个现有的系统需要为性能“重新设计”,那么这是一个有效的选择。对于一个新系统,如果您从头开始,并且没有在复制场景中,那么我总是选择ID INT IDENTITY(1,1)作为我的集群主键——这比其他任何东西都要高效得多!

#2


33  

I've been using GUIDs as PKs since 2005. In this distributed database world, it is absolutely the best way to merge distributed data. You can fire and forget merge tables without all the worry of ints matching across joined tables. GUIDs joins can be copied without any worry.

自2005年以来,我一直使用GUIDs作为PKs。在这个分布式数据库世界中,它绝对是合并分布式数据的最佳方式。您可以启动并忘记合并表,而不必担心跨连接表的ints匹配。可以毫不担心地复制gui连接。

This is my setup for using GUIDs:

这是我使用GUIDs的设置:

  1. PK = GUID. GUIDs are indexed similar to strings, so high row tables (over 50 million records) may need table partitioning or other performance techniques. SQL Server is getting extremely efficient, so performance concerns are less and less applicable.

    PK = GUID。gui与字符串类似,所以高行表(超过5000万条记录)可能需要表分区或其他性能技术。SQL Server变得非常高效,因此性能问题越来越不适用。

  2. PK Guid is NON-Clustered index. Never cluster index a GUID unless it is NewSequentialID. But even then, a server reboot will cause major breaks in ordering.

    PK Guid是非聚集索引。永远不要集群索引GUID,除非它是NewSequentialID。但即便如此,重新启动服务器也会导致订购出现重大中断。

  3. Add ClusterID Int to every table. This is your CLUSTERED Index...that orders your table.

    向每个表添加ClusterID Int。这是您的聚集索引…订单表。

  4. Joining on ClusterIDs (int) is more efficient, but I work with 20-30 million record tables, so joining on GUIDs doesn't visibly affect performance. If you want max performance, use the ClusterID concept as your primary key & join on ClusterID.

    加入ClusterIDs (int)更有效,但是我使用了两千万到三千万的记录表,所以加入GUIDs不会明显地影响性能。如果您想要最大的性能,请使用ClusterID概念作为您的主键和ClusterID上的join。

Here is my Email table...

这是我的邮箱……

CREATE TABLE [Core].[Email] (

[EmailID]      UNIQUEIDENTIFIER CONSTRAINT [DF_Email_EmailID] DEFAULT (newsequentialid()) NOT NULL,

[EmailAddress] NVARCHAR (50)    CONSTRAINT [DF_Email_EmailAddress] DEFAULT ('') NOT NULL,

[CreatedDate]  DATETIME         CONSTRAINT [DF_Email_CreatedDate] DEFAULT (getutcdate()) NOT NULL,

[ClusterID] INT NOT NULL IDENTITY,
    CONSTRAINT [PK_Email] PRIMARY KEY NonCLUSTERED ([EmailID] ASC)
);
GO

CREATE UNIQUE CLUSTERED INDEX [IX_Email_ClusterID] ON [Core].[Email] ([ClusterID])
GO

CREATE UNIQUE NonCLUSTERED INDEX [IX_Email_EmailAddress] ON [Core].[Email] ([EmailAddress] Asc)

#3


3  

If you use GUID as primary key and create clustered index then I suggest use the default of NEWSEQUENTIALID() value for it

如果您使用GUID作为主键并创建集群索引,那么我建议您为它使用NEWSEQUENTIALID()值的默认值

#4


3  

This link says it better than I could and helped in my decision making. I usually opt for an int as a primary key, unless I have a specific need not to and I also let SQL server auto-generate/maintain this field unless I have some specific reason not to. In reality, performance concerns need to be determined based on your specific app. There are many factors at play here including but not limited to expected db size, proper indexing, efficient querying, and more. Although people may disagree, I think in many scenarios you will not notice a difference with either option and you should choose what is more appropriate for your app and what allows you to develop easier, quicker, and more effectively (If you never complete the app what difference does the rest make :).

这个链接说它比我能做的更好并且帮助我做决定。我通常选择int作为主键,除非我有特定的不需要,我还允许SQL server自动生成/维护这个字段,除非我有特定的理由不这样做。实际上,性能问题需要根据您的特定应用程序来确定。这里有很多因素,包括但不限于期望的db大小、适当的索引、有效的查询等等。尽管人们可能会不同意,我认为在很多情况下你不会注意到差异与选择,你应该选择更适合您的应用程序允许您开发更容易,更快,更有效的(如果你不完成应用rest:什么区别)。

https://web.archive.org/web/20120812080710/http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

https://web.archive.org/web/20120812080710/http:/ /databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

P.S. I'm not sure why you would use a Composite PK or what benefit you believe that would give you.

附注:我不知道你为什么要用复合PK或者你相信它会给你什么好处。

#5


2  

I am currently developing an web application with EF Core and here is the pattern I use :

我目前正在开发一个带有EF核心的web应用程序,以下是我使用的模式:

All my classes (tables) and an int PK and FK. I have got a additional column with the type Guid (generated by the c# constructor) with a non clustered index on it.

我的所有类(表)和一个int PK和FK。我有一个带有Guid类型(由c#构造函数生成)的附加列,其中包含一个非聚集索引。

All the joins of table within EF is managed through the int keys while all the access from outside (controllers) are done with the Guids.

EF中的所有连接表都是通过int键进行管理的,而外部(控制器)的所有访问都是通过Guids完成的。

This solution allows to not show the int keys on urls but keep the model tidy and fast.

这个解决方案允许不显示url上的int键,但是保持模型的整洁和快速。

#1


387  

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

对于主键来说,GUIDs似乎是很自然的选择——如果您确实必须使用它,您可能会争论使用它作为表的主键。我强烈建议不要使用GUID列作为集群键,这是SQL Server默认的做法,除非您明确告诉它不要这样做。

You really need to keep two issues apart:

你真的需要把两个问题分开:

  1. the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

    主键是一个逻辑构造—候选键之一,它惟一而可靠地标识表中的每一行。这可以是任何东西,真的-一个INT, GUID,一个string -选择对你的场景最有意义的。

  2. the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

    集群键(定义表上“集群索引”的列或列)——这是与物理存储相关的东西,在这里,一个小的、稳定的、不断增加的数据类型是您的最佳选择——INT或BIGINT作为默认选项。

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

默认情况下,SQL Server表上的主键也被用作集群键——但不需要这样做!我个人认为,将以前的基于向导的主/聚集键分解为两个独立键——GUID上的主(逻辑)键,以及单独的INT IDENTITY(1,1)列上的聚类(排序)键,可以获得巨大的性能提升。

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

正如索引女王金伯利•特里普(Kimberly Tripp)和其他人多次指出的那样,GUID作为聚类键并不是最优的,因为由于它的随机性,它将导致大量的页面和索引碎片,并导致通常较差的性能。

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.

是的,我知道—SQL Server 2005和之上都有newsequentialid()—但是即使这样也不是真正的、完全的顺序的,因此也会遇到与GUID相同的问题—只是稍微不那么明显。

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

然后还要考虑另一个问题:表上的集群键将被添加到表上每个和每个非集群索引上的每个条目中——因此您确实希望确保它尽可能小。通常,拥有20多亿行的INT类型应该足以满足大多数表的需求——与GUID作为集群密钥相比,您可以在磁盘和服务器内存中节省数百兆字节的存储空间。

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

快速计算——使用INT / GUID作为主键和聚类键:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
  • 具有1,000 '000'行的基表(3.8 MB vs 15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)
  • 6个非聚集索引(22.89 MB vs 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

总计:25 MB和106 MB -这仅仅是在一个表上!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

还有一些值得思考的食物——金伯利·特里普(Kimberly Tripp)的杰作——读一读,再读一遍,把它消化掉!这是SQL Server索引福音。

PS: of course, if you're dealing with just a few hundred or a few thousand rows - most of these arguments won't really have much of an impact on you. However: if you get into the tens or hundreds of thousands of rows, or you start counting in millions - then those points become very crucial and very important to understand.

PS:当然,如果你只处理几百行或几千行,这些观点对你的影响并不大。然而:如果你进入数以万计的行,或者你开始以百万计计数,那么这些点就变得非常重要和非常重要。

Update: if you want to have your PKGUID column as your primary key (but not your clustering key), and another column MYINT (INT IDENTITY) as your clustering key - use this:

更新:如果你想让你的PKGUID列作为你的主键(但不是你的集群键),而另一个列MYINT (INT IDENTITY)作为你的集群键——使用以下方法:

CREATE TABLE dbo.MyTable
(PKGUID UNIQUEIDENTIFIER NOT NULL,
 MyINT INT IDENTITY(1,1) NOT NULL,
 .... add more columns as needed ...... )

ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY NONCLUSTERED (PKGUID)

CREATE UNIQUE CLUSTERED INDEX CIX_MyTable ON dbo.MyTable(MyINT)

Basically: you just have to explicitly tell the PRIMARY KEY constraint that it's NONCLUSTERED (otherwise it's created as your clustered index, by default) - and then you create a second index that's defined as CLUSTERED

基本上:您只需显式地告诉主键约束它是非群集的(否则它被默认地创建为您的群集索引)——然后创建第二个定义为群集的索引

This will work - and it's a valid option if you have an existing system that needs to be "re-engineered" for performance. For a new system, if you start from scratch, and you're not in a replication scenario, then I'd always pick ID INT IDENTITY(1,1) as my clustered primary key - much more efficient than anything else!

这将会起作用——如果您有一个现有的系统需要为性能“重新设计”,那么这是一个有效的选择。对于一个新系统,如果您从头开始,并且没有在复制场景中,那么我总是选择ID INT IDENTITY(1,1)作为我的集群主键——这比其他任何东西都要高效得多!

#2


33  

I've been using GUIDs as PKs since 2005. In this distributed database world, it is absolutely the best way to merge distributed data. You can fire and forget merge tables without all the worry of ints matching across joined tables. GUIDs joins can be copied without any worry.

自2005年以来,我一直使用GUIDs作为PKs。在这个分布式数据库世界中,它绝对是合并分布式数据的最佳方式。您可以启动并忘记合并表,而不必担心跨连接表的ints匹配。可以毫不担心地复制gui连接。

This is my setup for using GUIDs:

这是我使用GUIDs的设置:

  1. PK = GUID. GUIDs are indexed similar to strings, so high row tables (over 50 million records) may need table partitioning or other performance techniques. SQL Server is getting extremely efficient, so performance concerns are less and less applicable.

    PK = GUID。gui与字符串类似,所以高行表(超过5000万条记录)可能需要表分区或其他性能技术。SQL Server变得非常高效,因此性能问题越来越不适用。

  2. PK Guid is NON-Clustered index. Never cluster index a GUID unless it is NewSequentialID. But even then, a server reboot will cause major breaks in ordering.

    PK Guid是非聚集索引。永远不要集群索引GUID,除非它是NewSequentialID。但即便如此,重新启动服务器也会导致订购出现重大中断。

  3. Add ClusterID Int to every table. This is your CLUSTERED Index...that orders your table.

    向每个表添加ClusterID Int。这是您的聚集索引…订单表。

  4. Joining on ClusterIDs (int) is more efficient, but I work with 20-30 million record tables, so joining on GUIDs doesn't visibly affect performance. If you want max performance, use the ClusterID concept as your primary key & join on ClusterID.

    加入ClusterIDs (int)更有效,但是我使用了两千万到三千万的记录表,所以加入GUIDs不会明显地影响性能。如果您想要最大的性能,请使用ClusterID概念作为您的主键和ClusterID上的join。

Here is my Email table...

这是我的邮箱……

CREATE TABLE [Core].[Email] (

[EmailID]      UNIQUEIDENTIFIER CONSTRAINT [DF_Email_EmailID] DEFAULT (newsequentialid()) NOT NULL,

[EmailAddress] NVARCHAR (50)    CONSTRAINT [DF_Email_EmailAddress] DEFAULT ('') NOT NULL,

[CreatedDate]  DATETIME         CONSTRAINT [DF_Email_CreatedDate] DEFAULT (getutcdate()) NOT NULL,

[ClusterID] INT NOT NULL IDENTITY,
    CONSTRAINT [PK_Email] PRIMARY KEY NonCLUSTERED ([EmailID] ASC)
);
GO

CREATE UNIQUE CLUSTERED INDEX [IX_Email_ClusterID] ON [Core].[Email] ([ClusterID])
GO

CREATE UNIQUE NonCLUSTERED INDEX [IX_Email_EmailAddress] ON [Core].[Email] ([EmailAddress] Asc)

#3


3  

If you use GUID as primary key and create clustered index then I suggest use the default of NEWSEQUENTIALID() value for it

如果您使用GUID作为主键并创建集群索引,那么我建议您为它使用NEWSEQUENTIALID()值的默认值

#4


3  

This link says it better than I could and helped in my decision making. I usually opt for an int as a primary key, unless I have a specific need not to and I also let SQL server auto-generate/maintain this field unless I have some specific reason not to. In reality, performance concerns need to be determined based on your specific app. There are many factors at play here including but not limited to expected db size, proper indexing, efficient querying, and more. Although people may disagree, I think in many scenarios you will not notice a difference with either option and you should choose what is more appropriate for your app and what allows you to develop easier, quicker, and more effectively (If you never complete the app what difference does the rest make :).

这个链接说它比我能做的更好并且帮助我做决定。我通常选择int作为主键,除非我有特定的不需要,我还允许SQL server自动生成/维护这个字段,除非我有特定的理由不这样做。实际上,性能问题需要根据您的特定应用程序来确定。这里有很多因素,包括但不限于期望的db大小、适当的索引、有效的查询等等。尽管人们可能会不同意,我认为在很多情况下你不会注意到差异与选择,你应该选择更适合您的应用程序允许您开发更容易,更快,更有效的(如果你不完成应用rest:什么区别)。

https://web.archive.org/web/20120812080710/http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

https://web.archive.org/web/20120812080710/http:/ /databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

P.S. I'm not sure why you would use a Composite PK or what benefit you believe that would give you.

附注:我不知道你为什么要用复合PK或者你相信它会给你什么好处。

#5


2  

I am currently developing an web application with EF Core and here is the pattern I use :

我目前正在开发一个带有EF核心的web应用程序,以下是我使用的模式:

All my classes (tables) and an int PK and FK. I have got a additional column with the type Guid (generated by the c# constructor) with a non clustered index on it.

我的所有类(表)和一个int PK和FK。我有一个带有Guid类型(由c#构造函数生成)的附加列,其中包含一个非聚集索引。

All the joins of table within EF is managed through the int keys while all the access from outside (controllers) are done with the Guids.

EF中的所有连接表都是通过int键进行管理的,而外部(控制器)的所有访问都是通过Guids完成的。

This solution allows to not show the int keys on urls but keep the model tidy and fast.

这个解决方案允许不显示url上的int键,但是保持模型的整洁和快速。