高容量SQL Server 2008的密钥数据类型?

时间:2022-09-16 13:09:27

I in the process of designing a database for high volume data and I was wondering what datatype to use for the primary keys?

我正在为高容量数据设计数据库,我想知道主键使用什么数据类型?

There will be table partitioning and the database will ultimatley be clustered and will be hot failover to alternative datacentres.

将有表分区,数据库将被集群化,并将热备故障转移到备用数据中心。

EDIT


Tables - think chat system for multiple time periods and multiple things to chat about with multiple users chatting about the time period and thing.

表格 - 考虑聊天系统多个时间段和多个事物聊天与多个用户聊天的时间段和事情。

Exponential issues are what I am thinking about - ie something could generate billions of rows in small time period. ie before we could change the database or DBA doing DBA things

指数问题是我正在考虑的问题 - 即某些东西可能在很短的时间内产生数十亿行。即在我们改变数据库或DBA做DBA事之前

Mark - I share your concearn of GUID - I dont like coding with GUIDs flying about.

马克 - 我分享你对GUID的自负 - 我不喜欢用GUID编写代码。

4 个解决方案

#1


With just the little bit of info you've provided, I would recommend using a BigInt, which would take you up to 9,223,372,036,854,775,807, a number you're not likely to ever exceed. (Don't start with an INT and think you can easily change it to a BigInt when you exceed 2 billion rows. Its possible (I've done it), but can take an extremely long time, and involve significant system disruption.)

只需要提供一点点信息,我建议使用BigInt,这将带你达到9,223,372,036,854,775,807,这个数字你不可能超过。 (不要以INT开头,并认为当超过20亿行时可以轻松将其更改为BigInt。它可能(我已经完成),但可能需要很长时间,并且会导致严重的系统中断。)

Kimberly Tripp has an Excellent series of blog articles (GUIDs as PRIMARY KEYs and/or the clustering key and The Clustered Index Debate Continues) on the issue of creating clustered indexes, and choosing the primary key (related issues, but not always exactly the same). Her recommendation is that a clustered index/primary key should be:

Kimberly Tripp在创建聚簇索引和选择主键(相关问题,但并非总是完全相同)的问题上有一系列精彩的博客文章(GUID作为PRIMARY KEYs和/或聚类键和聚集索引辩论继续) )。她的建议是聚集索引/主键应该是:

  1. Unique (otherwise useless as a key)
  2. 独特(否则无用)

  3. Narrow (the key is used in all non-clustered indexes, and in foreign-key relationships)
  4. 窄(密钥用于所有非聚集索引和外键关系)

  5. Static (you don't want to have to change all related records)
  6. 静态(您不希望更改所有相关记录)

  7. Always Increasing (so new records always get added to the end of the table, and don't have to be inserted in the middle)
  8. 始终增加(因此新记录总是添加到表的末尾,而不必插入中间)

If you use a BigInt as an increasing identity as your key and your clustered index, that should satisfy all four of these requirements.

如果您使用BigInt作为增加的标识作为您的密钥和聚簇索引,那么它应该满足所有这四个要求。

Edit: Kimberly's article I mentioned above (GUIDs as PRIMARY KEYs and/or the clustering key) talks about why a (client generated) GUID is a bad choice for a clustering key:

编辑:上面提到的Kimberly的文章(GUID作为主要密钥和/或集群密钥)讨论了为什么(客户端生成的)GUID是集群密钥的错误选择:

But, a GUID that is not sequential - like one that has it's values generated in the client (using .NET) OR generated by the newid() function (in SQL Server) can be a horribly bad choice - primarily because of the fragmentation that it creates in the base table but also because of its size. It's unnecessarily wide (it's 4 times wider than an int-based identity - which can give you 2 billion (really, 4 billion) unique rows). And, if you need more than 2 billion you can always go with a bigint (8-byte int) and get 263-1 rows.

但是,一个非顺序的GUID - 比如在客户端(使用.NET)生成它的值或由newid()函数(在SQL Server中)生成的值可能是一个非常糟糕的选择 - 主要是因为碎片化它在基表中创建,但也因为它的大小。它不必要地宽(它比基于int的身份宽4倍 - 这可以给你20亿(真正的,40亿)唯一行)。而且,如果你需要超过20亿,你总是可以使用bigint(8字节int)并获得263-1行。

SQL has a function called NEWSEQUENTIALID() that allows you to generate sequential GUIDs that avoid the fragmentation issue, but they still have the problem of being unnecessarily wide.

SQL有一个名为NEWSEQUENTIALID()的函数,它允许您生成避免碎片问题的顺序GUID,但它们仍然存在不必要的宽泛问题。

#2


You can always go for int but taking into account your partitioning/clustering I'd suggest you look into uniqueidentifier which will generate globally unique keys.

您可以随时使用int,但考虑到您的分区/群集我建议您查看uniqueidentifier,它将生成全局唯一键。

#3


int tends to be the norm unless you need massive volume of data, and has the advantage of working with IDENTITY etc; Guid has some advantages if you want the numbers to be un-guessable or exportable, but if you use a Guid (unless you generate it yourself as "combed") you should ensure it is non-clustered (the index, that is; not the farm), as it won't be incremental.

int往往是常态,除非你需要大量的数据,并且具有使用IDENTITY等的优势;如果您希望数字不可猜测或可导出,Guid有一些优势,但如果您使用Guid(除非您自己生成为“精梳”),您应该确保它是非聚集的(索引,即;不是农场),因为它不会增量。

#4


I thik that int will be very good for it.

我认为int对它非常有益。

The range of INTEGER is - 2147483648 to 2147483647.

INTEGER的范围是 - 2147483648到2147483647。

also you can use UniqueIdentifier (GUID), but in this case

你也可以使用UniqueIdentifier(GUID),但在这种情况下

  • table row size limit in MSSQL
  • MSSQL中的表行大小限制

  • storage + memory. Imagine you have tables with 10000000 rows and growing
  • 存储+内存。想象一下,你有10000000行的表并且正在增长

  • flexibility: there are T-SQL operators available for INT like >, <, =, etc...
  • 灵活性:有可用于INT的T-SQL运算符,如>,<,=等...

  • GUID is not optimized for ORDER BY/GROUP BY queries and for range queries in general
  • GUID未针对ORDER BY / GROUP BY查询和范围查询进行优化

#1


With just the little bit of info you've provided, I would recommend using a BigInt, which would take you up to 9,223,372,036,854,775,807, a number you're not likely to ever exceed. (Don't start with an INT and think you can easily change it to a BigInt when you exceed 2 billion rows. Its possible (I've done it), but can take an extremely long time, and involve significant system disruption.)

只需要提供一点点信息,我建议使用BigInt,这将带你达到9,223,372,036,854,775,807,这个数字你不可能超过。 (不要以INT开头,并认为当超过20亿行时可以轻松将其更改为BigInt。它可能(我已经完成),但可能需要很长时间,并且会导致严重的系统中断。)

Kimberly Tripp has an Excellent series of blog articles (GUIDs as PRIMARY KEYs and/or the clustering key and The Clustered Index Debate Continues) on the issue of creating clustered indexes, and choosing the primary key (related issues, but not always exactly the same). Her recommendation is that a clustered index/primary key should be:

Kimberly Tripp在创建聚簇索引和选择主键(相关问题,但并非总是完全相同)的问题上有一系列精彩的博客文章(GUID作为PRIMARY KEYs和/或聚类键和聚集索引辩论继续) )。她的建议是聚集索引/主键应该是:

  1. Unique (otherwise useless as a key)
  2. 独特(否则无用)

  3. Narrow (the key is used in all non-clustered indexes, and in foreign-key relationships)
  4. 窄(密钥用于所有非聚集索引和外键关系)

  5. Static (you don't want to have to change all related records)
  6. 静态(您不希望更改所有相关记录)

  7. Always Increasing (so new records always get added to the end of the table, and don't have to be inserted in the middle)
  8. 始终增加(因此新记录总是添加到表的末尾,而不必插入中间)

If you use a BigInt as an increasing identity as your key and your clustered index, that should satisfy all four of these requirements.

如果您使用BigInt作为增加的标识作为您的密钥和聚簇索引,那么它应该满足所有这四个要求。

Edit: Kimberly's article I mentioned above (GUIDs as PRIMARY KEYs and/or the clustering key) talks about why a (client generated) GUID is a bad choice for a clustering key:

编辑:上面提到的Kimberly的文章(GUID作为主要密钥和/或集群密钥)讨论了为什么(客户端生成的)GUID是集群密钥的错误选择:

But, a GUID that is not sequential - like one that has it's values generated in the client (using .NET) OR generated by the newid() function (in SQL Server) can be a horribly bad choice - primarily because of the fragmentation that it creates in the base table but also because of its size. It's unnecessarily wide (it's 4 times wider than an int-based identity - which can give you 2 billion (really, 4 billion) unique rows). And, if you need more than 2 billion you can always go with a bigint (8-byte int) and get 263-1 rows.

但是,一个非顺序的GUID - 比如在客户端(使用.NET)生成它的值或由newid()函数(在SQL Server中)生成的值可能是一个非常糟糕的选择 - 主要是因为碎片化它在基表中创建,但也因为它的大小。它不必要地宽(它比基于int的身份宽4倍 - 这可以给你20亿(真正的,40亿)唯一行)。而且,如果你需要超过20亿,你总是可以使用bigint(8字节int)并获得263-1行。

SQL has a function called NEWSEQUENTIALID() that allows you to generate sequential GUIDs that avoid the fragmentation issue, but they still have the problem of being unnecessarily wide.

SQL有一个名为NEWSEQUENTIALID()的函数,它允许您生成避免碎片问题的顺序GUID,但它们仍然存在不必要的宽泛问题。

#2


You can always go for int but taking into account your partitioning/clustering I'd suggest you look into uniqueidentifier which will generate globally unique keys.

您可以随时使用int,但考虑到您的分区/群集我建议您查看uniqueidentifier,它将生成全局唯一键。

#3


int tends to be the norm unless you need massive volume of data, and has the advantage of working with IDENTITY etc; Guid has some advantages if you want the numbers to be un-guessable or exportable, but if you use a Guid (unless you generate it yourself as "combed") you should ensure it is non-clustered (the index, that is; not the farm), as it won't be incremental.

int往往是常态,除非你需要大量的数据,并且具有使用IDENTITY等的优势;如果您希望数字不可猜测或可导出,Guid有一些优势,但如果您使用Guid(除非您自己生成为“精梳”),您应该确保它是非聚集的(索引,即;不是农场),因为它不会增量。

#4


I thik that int will be very good for it.

我认为int对它非常有益。

The range of INTEGER is - 2147483648 to 2147483647.

INTEGER的范围是 - 2147483648到2147483647。

also you can use UniqueIdentifier (GUID), but in this case

你也可以使用UniqueIdentifier(GUID),但在这种情况下

  • table row size limit in MSSQL
  • MSSQL中的表行大小限制

  • storage + memory. Imagine you have tables with 10000000 rows and growing
  • 存储+内存。想象一下,你有10000000行的表并且正在增长

  • flexibility: there are T-SQL operators available for INT like >, <, =, etc...
  • 灵活性:有可用于INT的T-SQL运算符,如>,<,=等...

  • GUID is not optimized for ORDER BY/GROUP BY queries and for range queries in general
  • GUID未针对ORDER BY / GROUP BY查询和范围查询进行优化