I need a status column that will have about a dozen possible values. Is there any reason why I should choose int (StatusID) over char(4) (StatusCode)? Since sql server doesn't support named constants, char is far more descriptive than int when used in stored procedure and views as constants. To clarify, I would still use a lookup table either way. Since the I will need a more descriptive text for the UI. So this decision is only to help me as the developer when I'm maintaining the stored procedures and views.
我需要一个状态列,它将包含大约十几个可能的值。我有什么理由选择int(StatusID)而不是char(4)(StatusCode)?由于sql server不支持命名常量,因此在存储过程和视图中使用char作为常量时,char的描述性要比int更具描述性。为了澄清,我仍然会以任何方式使用查找表。因为我需要一个更具描述性的UI文本。因此,当我维护存储过程和视图时,这个决定只是为了帮助我作为开发人员。
Right now I'm leaning toward char(4). Especially since designing views in SQL Server Management Studio prevents me from adding comments (I know it's possible to add it in the script editor, but realistically I will use the View Designer far more often, especially if the view is trivial). StateCODE = 'NEW' is much more readable than StateID = 1000. I guess the question is will there be cases where char(4) is problematic, and since the database is pretty small, I'm not too concerned about slight performance hit (like using TinyInt versus int), but more afraid of code maintenance problems.
现在我倾向于char(4)。特别是因为在SQL Server Management Studio中设计视图阻止我添加注释(我知道可以在脚本编辑器中添加它,但实际上我会更频繁地使用视图设计器,特别是如果视图很简单)。 StateCODE ='NEW'比StateID = 1000更具可读性。我想问题是有没有char(4)有问题的情况,并且由于数据库非常小,我不太关心轻微的性能损失(比如使用TinyInt和int),但更害怕代码维护问题。
8 个解决方案
#1
Database purists will say a key should have no meaning in the business domain, and that you should create a status table where you look up the description and other meanings of the status.
数据库纯粹主义者会说密钥在业务领域中没有任何意义,您应该创建一个状态表,在其中查找状态的描述和其他含义。
But for operators and end users, having a descriptive status code can be a blessing. And it doesn't even have to be char(4), you can make it varchar(20). This allows them to query without joins, and inspect the database in an easier way.
但对于运营商和最终用户来说,拥有描述性状态代码可能是一件幸事。它甚至不必是char(4),你可以使它成为varchar(20)。这允许他们在没有连接的情况下进行查询,并以更简单的方式检查数据库。
In the end, I think the char(20) organization will run more smoothly, and go home earlier on Friday. But the int organization has a better abstraction of the database, and they can enjoy meta programming on friday evening (or boosting on forums.)
最后,我认为char(20)组织将运行得更顺利,并在周五早些时候回家。但是int组织有一个更好的数据库抽象,他们可以在星期五晚上享受元编程(或在论坛上提升)。
(All of this assuming that you're writing business support software. One of the more succesful business support systems, SAP, makes successful use of meaningful keys.)
(所有这一切都假设您正在编写业务支持软件。一个更成功的业务支持系统SAP成功使用了有意义的密钥。)
#2
There are many pro's and con's to each method. I'm sure other arguments will come up in favour of using a char(4). My reasons for choosing an int over a char include:
每种方法都有许多优点和缺点。我相信其他论据会支持使用char(4)。我在char上选择int的原因包括:
-
I always use lookup tables. They allow for an audit trail of the value to be retained and easily examined. For example, if one of your status codes is 'MING' and a business decision is made to change it from 'MING' to 'MONG' from a certain date, my lookup table handles this.
我总是使用查找表。它们允许保留和轻松检查值的审计跟踪。例如,如果您的某个状态代码是“MING”并且做出了商业决策,以便从特定日期将其从“MING”更改为“MONG”,则我的查找表会处理此问题。
-
Smaller index - if you need to index this column, it will be thinner.
较小的索引 - 如果您需要索引此列,它将更薄。
-
Extendability - OK, I made that word up, but if you need to go from 4 chars to 5 chars for example, a lookup table would be a blessing.
可扩展性 - 好吧,我提出了这个词,但是如果你需要从4个字符到5个字符,例如,查找表将是一个祝福。
-
Descriptions: We use a lot of TLA's here which once you know what they are is great but if I gave a business user a report that said "GDA's 2007 1001", they wouldn't necessarily twig that GDA = Good Dead on Arrival. With a lookup table, I can add this description.
描述:我们在这里使用了很多TLA,一旦你知道它们是什么很好但是如果我给一个商业用户一个报告说“GDA的2007 1001”,他们不一定会在GDA =抵达时的好死亡。使用查找表,我可以添加此描述。
-
Best practice: Can't find the link to hand but it might be something I read in a K.Tripp article. Aim to make your clustered primary key incrementing integers to optimise the index.
最佳实践:无法找到手头的链接,但这可能是我在K.Tripp文章中读到的内容。目的是使您的聚簇主键递增整数以优化索引。
Of course if you are absolutely positive that you will never need any more than a handful of 4 characters, there is no reason not to bang it in the table.
当然,如果你绝对肯定你永远不会需要超过少数4个字符,那么没有理由不在表中敲打它。
#3
The best thing should be a lookup table with defined values and then relate it to original table, that uses that enumeration.
最好的事情应该是具有已定义值的查找表,然后将其与使用该枚举的原始表相关联。
#4
Collation ambigities are one reason to say no to char 4: Does ABcD = abCD = äBCd?
整理过程是对char 4说不的原因之一:ABcD = abCD =äBCd?
If you have 12 possible values, why not tinyint/byte and a Status table? If you have to store the status for 10 million rows the 3 bytes different and the collation/string compares add up.
如果您有12个可能的值,为什么不tinyint / byte和Status表?如果必须存储1000万行的状态,则3个字节不同,并且校对/字符串比较加起来。
#5
The place where I've run into this use case is columns that would map onto things that I would typically use an Enum for when programming. Do you store the integer value of the Enum or the name of the Enum in the database column? Honestly, I've done it both ways. Usually, I ask myself if the database will be used outside the application I'm building. If so, I will choose the human readable format to store in the database. If not, then I'll choose the integer value as it saves a little time when reconstituting (it's just a cast instead of a parse operation) the Enum in code.
我遇到这个用例的地方是列,这些列将映射到我通常在编程时使用Enum的东西。您是否在数据库列中存储Enum的整数值或Enum的名称?老实说,我已经做到了两个方面。通常,我会问自己,数据库是否会在我正在构建的应用程序之外使用。如果是这样,我将选择人类可读的格式存储在数据库中。如果没有,那么我将选择整数值,因为它在代码中重构(它只是一个强制转换而不是解析操作)Enum时节省了一点时间。
#6
You could also use a tinyint over an int
你也可以在int上使用tinyint
#7
i always choose int's simply because they are easier to map to enums in code.
我总是选择int,因为它们更容易映射到代码中的枚举。
#8
If you're dealing with huge amounts of data and high throughput then a smallint or tinyint can give better performance and a smaller footprint on the hard disk. If the data in your application is often viewed directly through applications like Access or Cognos then your business people will probably appreciate the descriptive values. I know that when I'm analyzing data as part of my Database Developer role I get tired of joining a lot of lookup tables because I can't remember if 1 = Foo and 2 = Bar or 1 = Bar and 2 = Foo.
如果您正在处理大量数据和高吞吐量,那么smallint或tinyint可以在硬盘上提供更好的性能和更小的占用空间。如果您的应用程序中的数据通常直接通过Access或Cognos等应用程序查看,那么您的业务人员可能会欣赏描述性值。我知道当我分析数据作为我的数据库开发人员角色的一部分时,我厌倦了加入大量的查找表,因为我不记得1 = Foo和2 = Bar或1 = Bar和2 = Foo。
Also, although performance will be enhanced if you have to lookup rows by these codes which can have smaller indexes, it can also be hurt (in a minor way) by having to do the joins if you are often looking up rows regardless of the code but where you have to include the text value. In most applications that's not an issue though and would probably only come into play in large data warehousing/reporting environments.
此外,虽然如果你必须通过这些可能具有较小索引的代码查找行,性能会得到增强,但如果你经常查找行而不管代码是什么,它也可能会受到影响(以一种小的方式)但是你必须包含文本值。在大多数应用程序中虽然不是问题,但可能只会在大型数据仓库/报告环境中发挥作用。
#1
Database purists will say a key should have no meaning in the business domain, and that you should create a status table where you look up the description and other meanings of the status.
数据库纯粹主义者会说密钥在业务领域中没有任何意义,您应该创建一个状态表,在其中查找状态的描述和其他含义。
But for operators and end users, having a descriptive status code can be a blessing. And it doesn't even have to be char(4), you can make it varchar(20). This allows them to query without joins, and inspect the database in an easier way.
但对于运营商和最终用户来说,拥有描述性状态代码可能是一件幸事。它甚至不必是char(4),你可以使它成为varchar(20)。这允许他们在没有连接的情况下进行查询,并以更简单的方式检查数据库。
In the end, I think the char(20) organization will run more smoothly, and go home earlier on Friday. But the int organization has a better abstraction of the database, and they can enjoy meta programming on friday evening (or boosting on forums.)
最后,我认为char(20)组织将运行得更顺利,并在周五早些时候回家。但是int组织有一个更好的数据库抽象,他们可以在星期五晚上享受元编程(或在论坛上提升)。
(All of this assuming that you're writing business support software. One of the more succesful business support systems, SAP, makes successful use of meaningful keys.)
(所有这一切都假设您正在编写业务支持软件。一个更成功的业务支持系统SAP成功使用了有意义的密钥。)
#2
There are many pro's and con's to each method. I'm sure other arguments will come up in favour of using a char(4). My reasons for choosing an int over a char include:
每种方法都有许多优点和缺点。我相信其他论据会支持使用char(4)。我在char上选择int的原因包括:
-
I always use lookup tables. They allow for an audit trail of the value to be retained and easily examined. For example, if one of your status codes is 'MING' and a business decision is made to change it from 'MING' to 'MONG' from a certain date, my lookup table handles this.
我总是使用查找表。它们允许保留和轻松检查值的审计跟踪。例如,如果您的某个状态代码是“MING”并且做出了商业决策,以便从特定日期将其从“MING”更改为“MONG”,则我的查找表会处理此问题。
-
Smaller index - if you need to index this column, it will be thinner.
较小的索引 - 如果您需要索引此列,它将更薄。
-
Extendability - OK, I made that word up, but if you need to go from 4 chars to 5 chars for example, a lookup table would be a blessing.
可扩展性 - 好吧,我提出了这个词,但是如果你需要从4个字符到5个字符,例如,查找表将是一个祝福。
-
Descriptions: We use a lot of TLA's here which once you know what they are is great but if I gave a business user a report that said "GDA's 2007 1001", they wouldn't necessarily twig that GDA = Good Dead on Arrival. With a lookup table, I can add this description.
描述:我们在这里使用了很多TLA,一旦你知道它们是什么很好但是如果我给一个商业用户一个报告说“GDA的2007 1001”,他们不一定会在GDA =抵达时的好死亡。使用查找表,我可以添加此描述。
-
Best practice: Can't find the link to hand but it might be something I read in a K.Tripp article. Aim to make your clustered primary key incrementing integers to optimise the index.
最佳实践:无法找到手头的链接,但这可能是我在K.Tripp文章中读到的内容。目的是使您的聚簇主键递增整数以优化索引。
Of course if you are absolutely positive that you will never need any more than a handful of 4 characters, there is no reason not to bang it in the table.
当然,如果你绝对肯定你永远不会需要超过少数4个字符,那么没有理由不在表中敲打它。
#3
The best thing should be a lookup table with defined values and then relate it to original table, that uses that enumeration.
最好的事情应该是具有已定义值的查找表,然后将其与使用该枚举的原始表相关联。
#4
Collation ambigities are one reason to say no to char 4: Does ABcD = abCD = äBCd?
整理过程是对char 4说不的原因之一:ABcD = abCD =äBCd?
If you have 12 possible values, why not tinyint/byte and a Status table? If you have to store the status for 10 million rows the 3 bytes different and the collation/string compares add up.
如果您有12个可能的值,为什么不tinyint / byte和Status表?如果必须存储1000万行的状态,则3个字节不同,并且校对/字符串比较加起来。
#5
The place where I've run into this use case is columns that would map onto things that I would typically use an Enum for when programming. Do you store the integer value of the Enum or the name of the Enum in the database column? Honestly, I've done it both ways. Usually, I ask myself if the database will be used outside the application I'm building. If so, I will choose the human readable format to store in the database. If not, then I'll choose the integer value as it saves a little time when reconstituting (it's just a cast instead of a parse operation) the Enum in code.
我遇到这个用例的地方是列,这些列将映射到我通常在编程时使用Enum的东西。您是否在数据库列中存储Enum的整数值或Enum的名称?老实说,我已经做到了两个方面。通常,我会问自己,数据库是否会在我正在构建的应用程序之外使用。如果是这样,我将选择人类可读的格式存储在数据库中。如果没有,那么我将选择整数值,因为它在代码中重构(它只是一个强制转换而不是解析操作)Enum时节省了一点时间。
#6
You could also use a tinyint over an int
你也可以在int上使用tinyint
#7
i always choose int's simply because they are easier to map to enums in code.
我总是选择int,因为它们更容易映射到代码中的枚举。
#8
If you're dealing with huge amounts of data and high throughput then a smallint or tinyint can give better performance and a smaller footprint on the hard disk. If the data in your application is often viewed directly through applications like Access or Cognos then your business people will probably appreciate the descriptive values. I know that when I'm analyzing data as part of my Database Developer role I get tired of joining a lot of lookup tables because I can't remember if 1 = Foo and 2 = Bar or 1 = Bar and 2 = Foo.
如果您正在处理大量数据和高吞吐量,那么smallint或tinyint可以在硬盘上提供更好的性能和更小的占用空间。如果您的应用程序中的数据通常直接通过Access或Cognos等应用程序查看,那么您的业务人员可能会欣赏描述性值。我知道当我分析数据作为我的数据库开发人员角色的一部分时,我厌倦了加入大量的查找表,因为我不记得1 = Foo和2 = Bar或1 = Bar和2 = Foo。
Also, although performance will be enhanced if you have to lookup rows by these codes which can have smaller indexes, it can also be hurt (in a minor way) by having to do the joins if you are often looking up rows regardless of the code but where you have to include the text value. In most applications that's not an issue though and would probably only come into play in large data warehousing/reporting environments.
此外,虽然如果你必须通过这些可能具有较小索引的代码查找行,性能会得到增强,但如果你经常查找行而不管代码是什么,它也可能会受到影响(以一种小的方式)但是你必须包含文本值。在大多数应用程序中虽然不是问题,但可能只会在大型数据仓库/报告环境中发挥作用。