创建数据模型的最佳实践

时间:2021-04-08 16:54:37

For a current project I'm creating a data model. Are there any sources where I can find "best practices" for a good data model? Good means flexible, efficient, with good performance, style, ... Some example questions would be "naming of columns", "what data should be normalized", or "which attributes should be exported into an own table". The source should be a book :-)

对于当前项目,我正在创建一个数据模型。有什么来源可以让我找到一个好的数据模型的“最佳实践”?好的意思是灵活、高效、性能好、风格好……一些示例问题将是“列的命名”、“应该规范化哪些数据”或“应该将哪些属性导出到自己的表中”。来源应该是一本书:-)

3 个解决方案

#1


8  

Personally I think you should read a book on performance tuning before beginning to model a database. The right design can make a world of difference. If you are not expert in performance tuning, you aren't qualified to design a database.

我个人认为,在开始建模数据库之前,您应该阅读一本关于性能调优的书。正确的设计可以让世界变得不同。如果您不是性能调优方面的专家,那么您就没有资格设计数据库。

These books are Database specific, here is one for SQl Server. http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1

这些书是特定于数据库的,这里是一个SQl Server。http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1

Another book that you should read before starting to design is about antipatterns. Always good to know what you should avoid doing. http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1

在开始设计之前,您应该阅读的另一本书是关于反模式的。知道你应该避免做什么总是好的。http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1

Do not get stuck in the trap of designing for flexibility. People use that as a way to get out of doing the work to design correctly and flexible databases almost always perform badly. If more than 5% of your database design depends on flexibility, you haven't modeled correctly in my opinion. All the worst COTS products I've had to work with were designed for flexibility first.

不要陷入设计灵活性的陷阱。人们使用它作为一种方法来摆脱工作,设计正确和灵活的数据库几乎总是表现不好。如果超过5%的数据库设计依赖于灵活性,我认为您没有正确地建模。我所使用的所有最糟糕的COTS产品都是为灵活性而设计的。

Any decent database book will discuss normalization. You can also find that information easily on the web. Be sure to actually create FK/PK relationships.

任何好的数据库书籍都将讨论规范化。你也可以在网上轻松找到这些信息。一定要创建FK/PK关系。

As far as naming columns, pick a standard and stick with it consistently. Consistency is more important than the actual standard. Don't name columns ID (see SQL antipatterns book). Use the same name and datatypes if columns are going to be in several different tables. What you are going for is to not have to use functions to do joins because of datatype mismatches.

至于命名列,请选择一个标准并始终坚持使用它。一致性比实际标准更重要。不要命名列ID(参见SQL反模式手册)。如果列将在几个不同的表中,则使用相同的名称和数据类型。您要做的是不需要使用函数来进行连接,因为数据类型不匹配。

Always remember that databases can (and will) be changed outside the application. Anything that is needed for data integrity must be in the database not the application code. The data will be there long after the application has been replaced.

请始终记住,数据库可以(也将)在应用程序之外进行更改。数据完整性所需的任何内容都必须在数据库中,而不是应用程序代码中。在应用程序被替换之后,数据将会一直存在。

The most important things for database design:

数据库设计中最重要的事情:

  • Thorough definition of the data needed (including correct datatypes) and the relationships between pieces of data (including correct normalization)
  • 彻底定义所需的数据(包括正确的数据类型)以及数据片段之间的关系(包括正确的规范化)
  • data integrity
  • 数据完整性
  • performance
  • 性能
  • security
  • 安全
  • consistency (of datatypes, naming standards etc.)
  • 一致性(数据类型、命名标准等)

#2


2  

The best book I've read on the design of database systems was "An Introduction to Database Systems". Joe Celko's SQL for Smarties books are also worth reading. Assuming you're building an application and not just a database, and assuming you're using an Object Oriented language, Applying UML and Patterns by Craig Larman has a good discussion on mapping databases to objects.

我读过的关于数据库系统设计的最好的书是《数据库系统导论》。Joe Celko对Smarties书籍的SQL也值得一读。假设您正在构建一个应用程序,而不仅仅是一个数据库,并且假设您正在使用面向对象的语言,那么使用Craig Larman的UML和Patterns就可以很好地讨论如何将数据库映射到对象。

In terms of defining "good", in my experience "maintainable" is probably top of the list. Maintainability in database design means many things, such as sticking to conventions - I often recommend http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/. Normalization is another obvious maintainability strategy. I often recommend being generous with column types - it's hard to change an application if you find out that postal codes in different countries are longer than in the US. I often recommend using views to abstract complex data relations away for less experienced developers.

就定义“好”而言,根据我的经验,“可维护性”可能是最重要的。数据库设计中的可维护性意味着很多事情,比如坚持惯例——我经常推荐http://justinsomnia.org/2003/04/essential-database- namingonandstyle/。规范化是另一个明显的可维护性策略。我经常建议对列类型要慷慨——如果你发现不同国家的邮政编码比美国的长,那么你很难改变一个应用程序。我经常建议使用视图为经验不足的开发人员抽象复杂的数据关系。

A key thing with maintainability is the ability to test and deploy. It's worth reading up about Continuous Database Integration (http://www.codeproject.com/KB/architecture/Database_CI.aspx) - whilst not strictly associated with the design of the database schema, it's important context.

可维护性的一个关键是测试和部署的能力。阅读关于连续数据库集成的文章是值得的(http://www.codeproject.com/KB/architecture/Database_CI.aspx)——虽然它与数据库模式的设计没有严格的联系,但是它是重要的上下文。

As for performance - I believe you should design for maintainability first, and only design for performance if you know you have a problem. Sometimes, you know in advance that performance will be a major problem - designing a database for Facebook (or Stack Exchange), designing a database with huge amounts of data (terabytes and up), or huge numbers of users. Most systems don't fall into that camp - so I recommend regular performance tests, with representative data, to find if you have a problem, and only tune when you can prove you have to. Many performance optimizations are at the expense of maintainability - denormalization, for instance.

至于性能——我认为您应该首先设计可维护性,并且只有在您知道您有问题时才设计性能。有时,您预先知道性能将是一个主要问题——为Facebook(或堆栈交换)设计一个数据库,设计一个具有大量数据(tb以上)或大量用户的数据库。大多数系统都不属于这一阵营——所以我建议使用有代表性的数据进行常规性能测试,以发现是否存在问题,并且只在需要证明时进行调优。许多性能优化是以牺牲可维护性为代价的——例如,非规范化。

Oh, and in general, avoid triggers and stored procedures if you can. That's just my opinion, though...

哦,一般来说,如果可以,避免触发器和存储过程。不过,这只是我的看法。

#3


1  

Even though it is not a book I recommend to read Query evaluation techniques for large databases. It gives a background on query processing which largely influences your schema design, especially for data intensive (e.g., analytical) workloads. It is less hands-on but I believe every database designer should read it at least once :-).

尽管这不是我推荐的一本针对大型数据库的查询评估技术的书。它提供了查询处理的背景,这在很大程度上影响了您的模式设计,特别是对于数据密集型(例如,分析)工作负载。这是较少的动手,但我相信每个数据库设计者应该至少读它一次:-)。

#1


8  

Personally I think you should read a book on performance tuning before beginning to model a database. The right design can make a world of difference. If you are not expert in performance tuning, you aren't qualified to design a database.

我个人认为,在开始建模数据库之前,您应该阅读一本关于性能调优的书。正确的设计可以让世界变得不同。如果您不是性能调优方面的专家,那么您就没有资格设计数据库。

These books are Database specific, here is one for SQl Server. http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1

这些书是特定于数据库的,这里是一个SQl Server。http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1

Another book that you should read before starting to design is about antipatterns. Always good to know what you should avoid doing. http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1

在开始设计之前,您应该阅读的另一本书是关于反模式的。知道你应该避免做什么总是好的。http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1

Do not get stuck in the trap of designing for flexibility. People use that as a way to get out of doing the work to design correctly and flexible databases almost always perform badly. If more than 5% of your database design depends on flexibility, you haven't modeled correctly in my opinion. All the worst COTS products I've had to work with were designed for flexibility first.

不要陷入设计灵活性的陷阱。人们使用它作为一种方法来摆脱工作,设计正确和灵活的数据库几乎总是表现不好。如果超过5%的数据库设计依赖于灵活性,我认为您没有正确地建模。我所使用的所有最糟糕的COTS产品都是为灵活性而设计的。

Any decent database book will discuss normalization. You can also find that information easily on the web. Be sure to actually create FK/PK relationships.

任何好的数据库书籍都将讨论规范化。你也可以在网上轻松找到这些信息。一定要创建FK/PK关系。

As far as naming columns, pick a standard and stick with it consistently. Consistency is more important than the actual standard. Don't name columns ID (see SQL antipatterns book). Use the same name and datatypes if columns are going to be in several different tables. What you are going for is to not have to use functions to do joins because of datatype mismatches.

至于命名列,请选择一个标准并始终坚持使用它。一致性比实际标准更重要。不要命名列ID(参见SQL反模式手册)。如果列将在几个不同的表中,则使用相同的名称和数据类型。您要做的是不需要使用函数来进行连接,因为数据类型不匹配。

Always remember that databases can (and will) be changed outside the application. Anything that is needed for data integrity must be in the database not the application code. The data will be there long after the application has been replaced.

请始终记住,数据库可以(也将)在应用程序之外进行更改。数据完整性所需的任何内容都必须在数据库中,而不是应用程序代码中。在应用程序被替换之后,数据将会一直存在。

The most important things for database design:

数据库设计中最重要的事情:

  • Thorough definition of the data needed (including correct datatypes) and the relationships between pieces of data (including correct normalization)
  • 彻底定义所需的数据(包括正确的数据类型)以及数据片段之间的关系(包括正确的规范化)
  • data integrity
  • 数据完整性
  • performance
  • 性能
  • security
  • 安全
  • consistency (of datatypes, naming standards etc.)
  • 一致性(数据类型、命名标准等)

#2


2  

The best book I've read on the design of database systems was "An Introduction to Database Systems". Joe Celko's SQL for Smarties books are also worth reading. Assuming you're building an application and not just a database, and assuming you're using an Object Oriented language, Applying UML and Patterns by Craig Larman has a good discussion on mapping databases to objects.

我读过的关于数据库系统设计的最好的书是《数据库系统导论》。Joe Celko对Smarties书籍的SQL也值得一读。假设您正在构建一个应用程序,而不仅仅是一个数据库,并且假设您正在使用面向对象的语言,那么使用Craig Larman的UML和Patterns就可以很好地讨论如何将数据库映射到对象。

In terms of defining "good", in my experience "maintainable" is probably top of the list. Maintainability in database design means many things, such as sticking to conventions - I often recommend http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/. Normalization is another obvious maintainability strategy. I often recommend being generous with column types - it's hard to change an application if you find out that postal codes in different countries are longer than in the US. I often recommend using views to abstract complex data relations away for less experienced developers.

就定义“好”而言,根据我的经验,“可维护性”可能是最重要的。数据库设计中的可维护性意味着很多事情,比如坚持惯例——我经常推荐http://justinsomnia.org/2003/04/essential-database- namingonandstyle/。规范化是另一个明显的可维护性策略。我经常建议对列类型要慷慨——如果你发现不同国家的邮政编码比美国的长,那么你很难改变一个应用程序。我经常建议使用视图为经验不足的开发人员抽象复杂的数据关系。

A key thing with maintainability is the ability to test and deploy. It's worth reading up about Continuous Database Integration (http://www.codeproject.com/KB/architecture/Database_CI.aspx) - whilst not strictly associated with the design of the database schema, it's important context.

可维护性的一个关键是测试和部署的能力。阅读关于连续数据库集成的文章是值得的(http://www.codeproject.com/KB/architecture/Database_CI.aspx)——虽然它与数据库模式的设计没有严格的联系,但是它是重要的上下文。

As for performance - I believe you should design for maintainability first, and only design for performance if you know you have a problem. Sometimes, you know in advance that performance will be a major problem - designing a database for Facebook (or Stack Exchange), designing a database with huge amounts of data (terabytes and up), or huge numbers of users. Most systems don't fall into that camp - so I recommend regular performance tests, with representative data, to find if you have a problem, and only tune when you can prove you have to. Many performance optimizations are at the expense of maintainability - denormalization, for instance.

至于性能——我认为您应该首先设计可维护性,并且只有在您知道您有问题时才设计性能。有时,您预先知道性能将是一个主要问题——为Facebook(或堆栈交换)设计一个数据库,设计一个具有大量数据(tb以上)或大量用户的数据库。大多数系统都不属于这一阵营——所以我建议使用有代表性的数据进行常规性能测试,以发现是否存在问题,并且只在需要证明时进行调优。许多性能优化是以牺牲可维护性为代价的——例如,非规范化。

Oh, and in general, avoid triggers and stored procedures if you can. That's just my opinion, though...

哦,一般来说,如果可以,避免触发器和存储过程。不过,这只是我的看法。

#3


1  

Even though it is not a book I recommend to read Query evaluation techniques for large databases. It gives a background on query processing which largely influences your schema design, especially for data intensive (e.g., analytical) workloads. It is less hands-on but I believe every database designer should read it at least once :-).

尽管这不是我推荐的一本针对大型数据库的查询评估技术的书。它提供了查询处理的背景,这在很大程度上影响了您的模式设计,特别是对于数据密集型(例如,分析)工作负载。这是较少的动手,但我相信每个数据库设计者应该至少读它一次:-)。