使用MYSQL Sharding拆分大数TB的表

时间:2022-01-24 16:34:12

I know that horizontal partitioning...you can create many tables.

我知道水平分区......你可以创建很多表。

I've seen that In a application based sharding, you will have the same database structure on multiple database servers. But it won't contain the same data.

我已经看到在基于应用程序的分片中,您将在多个数据库服务器上拥有相同的数据库结构。但它不会包含相同的数据。

So for example:

例如:

Users 1 - 10000: server A
Users 10001 - 20000: server B

Techniques employed to shard are the MySQL-Proxy, for example some tools (based on MySQL Proxy) is SpockProxy. We can shard manually as well. Required would be a master table, e.g.:

用于分片的技术是MySQL-Proxy,例如一些工具(基于MySQL代理)是SpockProxy。我们也可以手动分片。必需的是一个主表,例如:

-------------------
| userA | server1 |
| userB | server2 |
| userC | server1 |
-------------------

But these above techniques handle at application level.. I want to solve it at DB server level..

但是这些技术在应用程序级别处理..我想在DB服务器级别解决它..

can we do this with multiple servers transparently? This will allow Mysql tables to scale.

我们可以透明地使用多个服务器吗?这将允许Mysql表扩展。

Create X tables on X servers, and end user gets data by simple query to single DB server?

在X服务器上创建X表,最终用户通过简单查询获取数据到单个数据库服务器?

In short i want to insert a data of 16 Terabyte in single table but i don't have such large space on single machine, so i want to install two servers each capacity of 8 terabyte. But User query to single db and get results while at backend may be sharding is used.

总之,我想在单个表中插入16TB的数据,但我没有在单机上有这么大的空间,所以我想安装两个服务器,每个容量为8TB。但是用户查询单个数据库并在后端获取结果可能是使用分片。

I also open this discussion for some other good solutions e.g. MYSQL Clustering.

我还讨论了一些其他好的解决方案,例如: MYSQL聚类。

Does anyone care to explain, or have a good beginner's tutorial (step-by-step) that teaches you how to partition across multiple servers?

有没有人关心解释,或者有一个好的初学者教程(循序渐进)教你如何跨多个服务器进行分区?

1 个解决方案

#1


0  

You need to adjust your thinking before you go forward. I don't think there is an easy way to do this on MySQL -- I am sure you can do it if you put in the effort using the FEDERATED table and views However, rdbs sharding at best is never easy.

在你前进之前,你需要调整你的思路。我不认为在MySQL上有一个简单的方法可以做到这一点 - 我相信如果你使用FEDERATED表和视图付出努力,你可以做到这一点但是,rdbs最好的分片绝非易事。

Sharding however is very had. Sharding tables is almost always the wrong way to look at it. Instead you really need to shard data sets. This is because joins across nodes are expensive.

然而,碎片是非常的。分片表几乎总是错误的查看方式。相反,你真的需要对数据集进行分片。这是因为跨节点的连接很昂贵。

So I highly recommend going back to the drawing board on this. If you really have no need for joins, look at other dbs like Cassandra which support this sort of thing out of the box. If you do need joins, however, you really need to look at every table in your database and find good partition criteria, then partition on that so you have the same db schema and then different shards.

所以我强烈建议你回到绘图板上。如果你真的不需要加入,那么看看像Cassandra这样开箱即用的其他dbs。但是,如果确实需要连接,则确实需要查看数据库中的每个表并找到良好的分区条件,然后对其进行分区,以便使用相同的数据库模式,然后使用不同的分片。

Once you have that in place then you put a proxy in front of your databases to handle routing queries appropriately. See https://github.com/flike/kingshard as one possibility (though as a disclaimer I have not worked with these on MySQL). With the proxy, you get the appearance to your app of a single db and as I read your question that's really what you are aiming for.

一旦你有了这个,那么你在数据库前放置一个代理来适当地处理路由查询。请参阅https://github.com/flike/kingshard作为一种可能性(虽然作为免责声明,我没有在MySQL上使用过这些)。使用代理,您可以获得单个数据库的应用程序外观,并且当我阅读您的问题时,这正是您的目标。

#1


0  

You need to adjust your thinking before you go forward. I don't think there is an easy way to do this on MySQL -- I am sure you can do it if you put in the effort using the FEDERATED table and views However, rdbs sharding at best is never easy.

在你前进之前,你需要调整你的思路。我不认为在MySQL上有一个简单的方法可以做到这一点 - 我相信如果你使用FEDERATED表和视图付出努力,你可以做到这一点但是,rdbs最好的分片绝非易事。

Sharding however is very had. Sharding tables is almost always the wrong way to look at it. Instead you really need to shard data sets. This is because joins across nodes are expensive.

然而,碎片是非常的。分片表几乎总是错误的查看方式。相反,你真的需要对数据集进行分片。这是因为跨节点的连接很昂贵。

So I highly recommend going back to the drawing board on this. If you really have no need for joins, look at other dbs like Cassandra which support this sort of thing out of the box. If you do need joins, however, you really need to look at every table in your database and find good partition criteria, then partition on that so you have the same db schema and then different shards.

所以我强烈建议你回到绘图板上。如果你真的不需要加入,那么看看像Cassandra这样开箱即用的其他dbs。但是,如果确实需要连接,则确实需要查看数据库中的每个表并找到良好的分区条件,然后对其进行分区,以便使用相同的数据库模式,然后使用不同的分片。

Once you have that in place then you put a proxy in front of your databases to handle routing queries appropriately. See https://github.com/flike/kingshard as one possibility (though as a disclaimer I have not worked with these on MySQL). With the proxy, you get the appearance to your app of a single db and as I read your question that's really what you are aiming for.

一旦你有了这个,那么你在数据库前放置一个代理来适当地处理路由查询。请参阅https://github.com/flike/kingshard作为一种可能性(虽然作为免责声明,我没有在MySQL上使用过这些)。使用代理,您可以获得单个数据库的应用程序外观,并且当我阅读您的问题时,这正是您的目标。