用于大数据存储和操作的数据库是什么?

时间:2021-08-11 16:23:19

I have to make a decision of which database server to use for my next project, but the simple decision to use MySQL like almost all the projects I did is harder now, because I expect very much records.

我必须决定将哪个数据库服务器用于我的下一个项目,但是像我所做的几乎所有项目一样使用MySQL的简单决定现在更难了,因为我期待很多记录。

The database will store a user list, some other irrelevant tables, and the last one, some user-collected data. Let's say, if I have 6000 users responding to a quiz about each other. Simple math shows that from those users, if each one completes the quiz about everyone (and in my project that is 99% sure that will happen) I'll end up with 35.99million records(they will exclude themselves and in this particular situation the operation is 6000*5999). Unfortunately 6000 maybe is a small number, the real one growing day by day.

数据库将存储用户列表,一些其他不相关的表,以及最后一个用户收集的数据。让我们说,如果我有6000个用户回答关于彼此的测验。简单的数学表明,如果每个人都完成了关于每个人的测验(并且在我的项目中99%肯定会发生),我最终会得到3599万条记录(他们会将自己排除在外,在这种特殊情况下操作是6000 * 5999)。不幸的是6000可能是一个小数字,真正的一天一天在增长。

What to choose? MySQL and maybe if things go well and the project grows to expand it in a cluster? PostgreSQL, MSSQL? Oracle?

选择什么? MySQL,也许如果事情进展顺利,项目会在集群中扩展它? PostgreSQL,MSSQL?甲骨文?

I've read about all of them, each one has it's pros and cons, but still don't know what to choose. The advantage of MySQL and PostgreSQL is of course, the starting price of $0 which is pretty nice in a usual self-funded startup.

我已经阅读了所有这些内容,每个人都有它的优点和缺点,但仍然不知道该选择什么。 MySQL和PostgreSQL的优势当然是0美元的起始价格,这在通常的自筹资金初创公司中非常不错。

Any opinions, pieces of advice? If you encountered this situation in your experience as developers, I'd love to hear from you.

有什么意见,建议吗?如果您在开发人员的经历中遇到过这种情况,我很乐意听取您的意见。

9 个解决方案

#1


1  

Most of the truly large scale web properties use a distributed key-value store. That said, 35 million is large, but not that large. With most modern databases, your main two scaling worries should be throughput and what happens when no single box can contain your entire database anymore. And both of these problems can be solved to some degree for any database you choose to use. (Caching, replication, sharding, etc.)

大多数真正大规模的Web属性都使用分布式键值存储。也就是说,3500万是大的,但不是那么大。对于大多数现代数据库,主要的两个扩展问题应该是吞吐量以及当没有单个框可以包含整个数据库时会发生什么。对于您选择使用的任何数据库,这些问题都可以在某种程度上得到解决。 (缓存,复制,分片等)

Use MySQL until you can't anymore. At that point, you ought to be rolling in dough anyways and you now have a very desirable problem.

使用MySQL直到你不能再使用它。在那一点上,你应该在面团中滚动,你现在有一个非常理想的问题。

#2


4  

These days, free isn't something that differenciates between databases any more. Both Oracle and SQL Server have free versions, but the limitations is resources - 4 GB database, RAM & single CPU utilization. Millions of records is not a concern - it's what datatypes you're using.

现在,free不再是数据库之间的区别了。 Oracle和SQL Server都有免费版本,但限制是资源 - 4 GB数据库,RAM和单CPU利用率。数百万条记录不是问题 - 它是您正在使用的数据类型。

I saw the OPs comment about not liking MS software - that's your prerogative, but using the free versions of either Oracle or SQL Server do benefit from seamless transition to upscale versions of the respective database.

我看到OP关于不喜欢MS软件的评论 - 这是你的特权,但是使用Oracle或SQL Server的免费版本确实可以从无缝过渡到相应数据库的高级版本中受益。

Personally, my choice would be either Oracle or SQL Server because of IMHO, real feature considerations like hierarchical query support, subquery factoring/CTE, packages (long before I get concerned with functions/procedures), full text searching, xml support, etc.

就个人而言,我的选择可能是Oracle或SQL Server,因为恕我直言,真正的功能考虑因素,如分层查询支持,子查询因子/ CTE,包(早在我关注函数/程序之前),全文搜索,xml支持等。

#3


3  

MySQL will handle 35 million records no problem. Worry about scalability when you get there. You can easily add raid hard disks backing your database tables, and if you really start getting big you can get a compellant SAN that will scream... Don't worry about the DB engine as much as the underlying hardware.. MySQL rocks for us with millions of records.

MySQL将处理3500万条记录没问题。当你到达那里时担心可扩展性。您可以轻松添加支持数据库表的raid硬盘,如果你真的开始变大,你可以得到一个令人震惊的强制性SAN ......不要担心数据库引擎和底层硬件一样多.MySQL摇滚我们拥有数百万条记录。

#4


2  

I've had no problems handling tables as large as 36,000,000 rows on MySQL and Oracle.

我在处理MySQL和Oracle上36,000,000行的表时没有遇到任何问题。

Just be sure that you index the proper columns, run EXPLAINs for your queries, and maintain proper design principles.

只需确保为正确的列编制索引,为查询运行EXPLAIN,并保持正确的设计原则。

#5


1  

Use MySQL as it's free and you have experience with it.

使用MySQL,因为它是免费的,你有它的经验。

Besides in my opinion it matters more on how you design the tables than which database you use.

此外,在我看来,更重要的是你如何设计表而不是你使用的数据库。

#6


0  

35 million records can be easily handled by MS SQL Server (assuming proper database design, indices, etc.). You can start with the free SQL Server Express edition and later, if you need, you can upgrade to the full version which supports clustering, etc.

MS SQL Server可以轻松处理3500万条记录(假设数据库设计,索引等)。您可以从免费的SQL Server Express版本开始,如果需要,可以升级到支持群集等的完整版本。

SQL Server Express does have some limitations - single CPU, 1 GB memory, max 4 GB database size and a few other things. I'm not sure how quickly these limitations will become a problem but you can always move to the full version when you run into them.

SQL Server Express确实有一些限制 - 单CPU,1 GB内存,最大4 GB数据库大小和其他一些东西。我不确定这些限制会多快成为一个问题,但是当你遇到它们时,你总是可以转到完整版本。

#7


0  

MySQL(i) & Postgre

MySQL(i)和Postgre

  • 0$ of costs
  • 0美元的费用

  • large community
  • many tutorials
  • well documentated

MSSQL

  • You can get "money" from MS if you promote that you are using MSSQL (secret information from some companies I worked for)
  • 如果你宣传你正在使用MSSQL(我工作的一些公司的秘密信息),你可以从MS获得“钱”

  • MS tools work very well
  • MS工具运行良好

  • Complete tool set from C# IDE over .NET lib to Windows Server 2003
  • 从C#IDE over .NET lib到Windows Server 2003的完整工具集

Oracle

  • Professional and commercial provider
  • 专业和商业提供商

  • Used by many large companies (I also heard about Blizzard (World of Warcraft) using Oracle)
  • 被许多大公司使用(我也听说过使用Oracle的暴雪(魔兽世界))

  • - expensive

The final decision depends on the very special requirements of your project. Make yourself a quick list of things , that ARE IMPORTANT for your project (e.g. quick performed queries) and look up which Database pros are matching the most to your requirements.

最终决定取决于项目的特殊要求。让自己快速列出对您的项目至关重要的事项(例如快速执行的查询),并查找哪些数据库专业人员最符合您的要求。

Everything is about design. SQL Database are some kind of cars, you just have to know which component has to be placed here and which there. Make a clear design and you won't struggle with any of them.

一切都与设计有关。 SQL数据库是某种汽车,您只需要知道哪个组件必须放在这里以及哪个组件。做一个清晰的设计,你不会挣扎任何一个。

#8


0  

May be you can test Firebird

也许你可以测试Firebird

Blog post about big Firebird database here

关于大Firebird数据库的博客帖子在这里

MySQL licence is here (not allways free).

MySQL许可证在这里(并非总是免费)。

Postgresql and Firebird are free.

Postgresql和Firebird是免费的。

#9


0  

First of all, don't think about performance. Premature optimization being the root of all evil and all that. You can always throw more hardware and/or tuning at it later.

首先,不要考虑性能。过早的优化是所有邪恶的根源。您可以随时抛出更多硬件和/或进行调整。

All of the mentioned should perform nicely if tuned/maintained correctly. I'd focus on manageability and familiarity. IMHO open source databases excels on manageability (perhaps not the best GUIs, but the CLI has been my home for a long long time).

如果正确调整/维护,所有提到的都应该很好地执行。我专注于可管理性和熟悉性。恕我直言的开源数据库在可管理性方面表现出色(可能不是最好的GUI,但CLI已经很长时间以来一直是我的家)。

And if the database becomes the bottleneck, why limit yourself to those choices? How about a key-value distributed database? Or perhaps serialize data directly to disk? Storing data outside of a RDBMS, while often frowned upon, might be the correct path. Or simply use the common route of denormalization.

如果数据库成为瓶颈,为什么要限制自己的选择呢?键值分布式数据库怎么样?或者可能将数据直接序列化到磁盘?将数据存储在RDBMS之外,虽然经常不赞成,但可能是正确的路径。或者只是使用非规范化的常见路径。

Always remember not to optimize prematurely.

永远记住不要过早优化。

As far as opinions go (since you specifically asked for it) I favor open source databases, specifically PostgreSQL. It's rock solid, fast and very well-featured. And even with (relatively) large datasets it has performed superbly on mediocre hardware (some tuning involved, of course, but you can't skip that step no matter which db you end up choosing).

至于意见(因为你特别要求),我赞成开源数据库,特别是PostgreSQL。它坚如磐石,速度快,功能齐全。即使使用(相对)大型数据集,它也能在平庸的硬件上表现出色(当然,涉及到某些调整,但无论您最终选择哪个数据库,都无法跳过该步骤)。

#1


1  

Most of the truly large scale web properties use a distributed key-value store. That said, 35 million is large, but not that large. With most modern databases, your main two scaling worries should be throughput and what happens when no single box can contain your entire database anymore. And both of these problems can be solved to some degree for any database you choose to use. (Caching, replication, sharding, etc.)

大多数真正大规模的Web属性都使用分布式键值存储。也就是说,3500万是大的,但不是那么大。对于大多数现代数据库,主要的两个扩展问题应该是吞吐量以及当没有单个框可以包含整个数据库时会发生什么。对于您选择使用的任何数据库,这些问题都可以在某种程度上得到解决。 (缓存,复制,分片等)

Use MySQL until you can't anymore. At that point, you ought to be rolling in dough anyways and you now have a very desirable problem.

使用MySQL直到你不能再使用它。在那一点上,你应该在面团中滚动,你现在有一个非常理想的问题。

#2


4  

These days, free isn't something that differenciates between databases any more. Both Oracle and SQL Server have free versions, but the limitations is resources - 4 GB database, RAM & single CPU utilization. Millions of records is not a concern - it's what datatypes you're using.

现在,free不再是数据库之间的区别了。 Oracle和SQL Server都有免费版本,但限制是资源 - 4 GB数据库,RAM和单CPU利用率。数百万条记录不是问题 - 它是您正在使用的数据类型。

I saw the OPs comment about not liking MS software - that's your prerogative, but using the free versions of either Oracle or SQL Server do benefit from seamless transition to upscale versions of the respective database.

我看到OP关于不喜欢MS软件的评论 - 这是你的特权,但是使用Oracle或SQL Server的免费版本确实可以从无缝过渡到相应数据库的高级版本中受益。

Personally, my choice would be either Oracle or SQL Server because of IMHO, real feature considerations like hierarchical query support, subquery factoring/CTE, packages (long before I get concerned with functions/procedures), full text searching, xml support, etc.

就个人而言,我的选择可能是Oracle或SQL Server,因为恕我直言,真正的功能考虑因素,如分层查询支持,子查询因子/ CTE,包(早在我关注函数/程序之前),全文搜索,xml支持等。

#3


3  

MySQL will handle 35 million records no problem. Worry about scalability when you get there. You can easily add raid hard disks backing your database tables, and if you really start getting big you can get a compellant SAN that will scream... Don't worry about the DB engine as much as the underlying hardware.. MySQL rocks for us with millions of records.

MySQL将处理3500万条记录没问题。当你到达那里时担心可扩展性。您可以轻松添加支持数据库表的raid硬盘,如果你真的开始变大,你可以得到一个令人震惊的强制性SAN ......不要担心数据库引擎和底层硬件一样多.MySQL摇滚我们拥有数百万条记录。

#4


2  

I've had no problems handling tables as large as 36,000,000 rows on MySQL and Oracle.

我在处理MySQL和Oracle上36,000,000行的表时没有遇到任何问题。

Just be sure that you index the proper columns, run EXPLAINs for your queries, and maintain proper design principles.

只需确保为正确的列编制索引,为查询运行EXPLAIN,并保持正确的设计原则。

#5


1  

Use MySQL as it's free and you have experience with it.

使用MySQL,因为它是免费的,你有它的经验。

Besides in my opinion it matters more on how you design the tables than which database you use.

此外,在我看来,更重要的是你如何设计表而不是你使用的数据库。

#6


0  

35 million records can be easily handled by MS SQL Server (assuming proper database design, indices, etc.). You can start with the free SQL Server Express edition and later, if you need, you can upgrade to the full version which supports clustering, etc.

MS SQL Server可以轻松处理3500万条记录(假设数据库设计,索引等)。您可以从免费的SQL Server Express版本开始,如果需要,可以升级到支持群集等的完整版本。

SQL Server Express does have some limitations - single CPU, 1 GB memory, max 4 GB database size and a few other things. I'm not sure how quickly these limitations will become a problem but you can always move to the full version when you run into them.

SQL Server Express确实有一些限制 - 单CPU,1 GB内存,最大4 GB数据库大小和其他一些东西。我不确定这些限制会多快成为一个问题,但是当你遇到它们时,你总是可以转到完整版本。

#7


0  

MySQL(i) & Postgre

MySQL(i)和Postgre

  • 0$ of costs
  • 0美元的费用

  • large community
  • many tutorials
  • well documentated

MSSQL

  • You can get "money" from MS if you promote that you are using MSSQL (secret information from some companies I worked for)
  • 如果你宣传你正在使用MSSQL(我工作的一些公司的秘密信息),你可以从MS获得“钱”

  • MS tools work very well
  • MS工具运行良好

  • Complete tool set from C# IDE over .NET lib to Windows Server 2003
  • 从C#IDE over .NET lib到Windows Server 2003的完整工具集

Oracle

  • Professional and commercial provider
  • 专业和商业提供商

  • Used by many large companies (I also heard about Blizzard (World of Warcraft) using Oracle)
  • 被许多大公司使用(我也听说过使用Oracle的暴雪(魔兽世界))

  • - expensive

The final decision depends on the very special requirements of your project. Make yourself a quick list of things , that ARE IMPORTANT for your project (e.g. quick performed queries) and look up which Database pros are matching the most to your requirements.

最终决定取决于项目的特殊要求。让自己快速列出对您的项目至关重要的事项(例如快速执行的查询),并查找哪些数据库专业人员最符合您的要求。

Everything is about design. SQL Database are some kind of cars, you just have to know which component has to be placed here and which there. Make a clear design and you won't struggle with any of them.

一切都与设计有关。 SQL数据库是某种汽车,您只需要知道哪个组件必须放在这里以及哪个组件。做一个清晰的设计,你不会挣扎任何一个。

#8


0  

May be you can test Firebird

也许你可以测试Firebird

Blog post about big Firebird database here

关于大Firebird数据库的博客帖子在这里

MySQL licence is here (not allways free).

MySQL许可证在这里(并非总是免费)。

Postgresql and Firebird are free.

Postgresql和Firebird是免费的。

#9


0  

First of all, don't think about performance. Premature optimization being the root of all evil and all that. You can always throw more hardware and/or tuning at it later.

首先,不要考虑性能。过早的优化是所有邪恶的根源。您可以随时抛出更多硬件和/或进行调整。

All of the mentioned should perform nicely if tuned/maintained correctly. I'd focus on manageability and familiarity. IMHO open source databases excels on manageability (perhaps not the best GUIs, but the CLI has been my home for a long long time).

如果正确调整/维护,所有提到的都应该很好地执行。我专注于可管理性和熟悉性。恕我直言的开源数据库在可管理性方面表现出色(可能不是最好的GUI,但CLI已经很长时间以来一直是我的家)。

And if the database becomes the bottleneck, why limit yourself to those choices? How about a key-value distributed database? Or perhaps serialize data directly to disk? Storing data outside of a RDBMS, while often frowned upon, might be the correct path. Or simply use the common route of denormalization.

如果数据库成为瓶颈,为什么要限制自己的选择呢?键值分布式数据库怎么样?或者可能将数据直接序列化到磁盘?将数据存储在RDBMS之外,虽然经常不赞成,但可能是正确的路径。或者只是使用非规范化的常见路径。

Always remember not to optimize prematurely.

永远记住不要过早优化。

As far as opinions go (since you specifically asked for it) I favor open source databases, specifically PostgreSQL. It's rock solid, fast and very well-featured. And even with (relatively) large datasets it has performed superbly on mediocre hardware (some tuning involved, of course, but you can't skip that step no matter which db you end up choosing).

至于意见(因为你特别要求),我赞成开源数据库,特别是PostgreSQL。它坚如磐石,速度快,功能齐全。即使使用(相对)大型数据集,它也能在平庸的硬件上表现出色(当然,涉及到某些调整,但无论您最终选择哪个数据库,都无法跳过该步骤)。