在树莓派网络上的数据库复制

时间:2021-05-30 07:25:07

Does anyone have a good suggestion as to what database I should use, to achieve replication across a variable number of targets? I have a mesh network of Raspberry Pi servers, each of which can contain a database. I want the contents of each database to be replicated across the network, but I can't guarantee what nodes are available at any point in time.

对于我应该使用什么数据库来实现跨不同目标数量的复制,有人有好的建议吗?我有一个网状的树莓Pi服务器,每个都可以包含一个数据库。我希望每个数据库的内容都可以通过网络进行复制,但是我不能保证在任何时间点上都有哪些节点可用。

Most nosql databases (CouchDB, Cassandra for example) appear to only support defined targets in the configuration.

大多数nosql数据库(例如CouchDB、Cassandra)似乎只支持配置中定义的目标。

So (assuming nosql is the best database option); is there a nosql database that can replicate to variable number of targets?

(假设nosql是最好的数据库选项);是否有一个nosql数据库可以复制到不同数量的目标?

5 个解决方案

#1


4  

For this scenario I would recommend the Hadoop Distributed File System (HDFS).

对于这个场景,我建议使用Hadoop分布式文件系统(HDFS)。

Features that make HDFS attractive to your scenario:

使HDFS对您的场景有吸引力的特性:

  • It is a distributed file system with variable replication factor (the default is 3 which is nearly impossible to lose data with).
  • 它是一个具有可变复制因子的分布式文件系统(默认值是3,几乎不可能丢失数据)。
  • Can scale up to thousands of different machines
  • 能扩展到上千台不同的机器吗
  • Does not depend on high availability of individual nodes -- automatically handles node failure and replicates any data from downed nodes
  • 是否不依赖于单个节点的高可用性——自动处理节点故障并从已关闭的节点复制任何数据

As for the actual database... HBase, Mongo, or Cassandra are all good options here, pick whatever you are most comfortable with -- HDFS will take care of all of the replication for you.

至于实际的数据库……HBase、Mongo或Cassandra都是不错的选择,您可以选择您最喜欢的——HDFS将为您处理所有复制。

#2


3  

According to this SO response:

根据这一回应:

https://*.com/a/8787999/2020565

https://*.com/a/8787999/2020565

And upon cheking their website, maybe you should check Elliptics: http://www.ioremap.net/projects/elliptics/

并且在他们的网站上,也许你应该检查椭圆:http://www.ioremap.net/projects/椭球/。

The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage. Small to medium sized write benchmarks can be found on eblob page.

网络不使用专用服务器来维护元数据信息,它支持冗余对象存储。可以在eblob页面上找到中小型写基准。

#3


3  

In my experience Elasticsearch has great and easy-to-use cluster management, it supports out of the box nice features such as node autodiscovery, data replication, auto-rebalancing etc., have a look at docs. Usually it is used to replicate data from an other database to make it searchable but I don't see why it couldn't be used in this context as well.

根据我的经验,Elasticsearch有很好的、易于使用的集群管理,它支持开箱即用的优秀特性,如节点自动发现、数据复制、自动再平衡等。通常,它用于复制来自其他数据库的数据,使其可搜索,但我不明白为什么不能在此上下文中使用它。

Basically when you create a "table" (called "index" in ES) you get to decide that in how many "partitions" (called "shards") the data should be partitioned, and ad-hoc set that how many replicas of that table you want to have (this doesn't 100% match the correct terminology since an "index" can consist of multiple "types" but I think this is the best analogy).

基本上,当您创建一个“表”(称为“指数”在ES)你来决定,有多少“分区”(称为“碎片”)数据应该分区,和特别设置表你想要多少个副本(这并不100%匹配正确的术语,因为一个“索引”可以包含多个“类型”,但我认为这是最好的类比)。

An example project with three Pis is here.

这里有一个具有三个Pis的示例项目。

I have read a bit about Cassandra as well and I imagine it would have similar features, for example partitions and replicas are mentioned here.

我也读过一些关于Cassandra的文章,我想它应该有类似的特性,例如这里提到的分区和副本。

#4


2  

I'd recommend taking a look at Hazelcast. They do pretty good in memory replication across a cluster that might change. You'd have to write a custom client to store the data into a local database of your choice if you want disk backed persistence, but Hazelcast can take care of replication across a cluster in memory and has a lot of flexibility.

我建议你去看看榛子酱。它们在跨集群的内存复制方面做得很好,这可能会改变。如果需要磁盘支持的持久性,则必须编写一个自定义客户机将数据存储到您选择的本地数据库中,但是Hazelcast可以在内存中处理跨集群的复制,并且具有很大的灵活性。

#5


0  

  1. You should consider Erlang OTP platform and Mnesia database

    您应该考虑Erlang OTP平台和Mnesia数据库

  2. If you prefer C language you can consider SQlite in memory database together with nanomsg framework

    如果您喜欢C语言,可以考虑使用nanomsg框架在内存数据库中使用SQlite

#1


4  

For this scenario I would recommend the Hadoop Distributed File System (HDFS).

对于这个场景,我建议使用Hadoop分布式文件系统(HDFS)。

Features that make HDFS attractive to your scenario:

使HDFS对您的场景有吸引力的特性:

  • It is a distributed file system with variable replication factor (the default is 3 which is nearly impossible to lose data with).
  • 它是一个具有可变复制因子的分布式文件系统(默认值是3,几乎不可能丢失数据)。
  • Can scale up to thousands of different machines
  • 能扩展到上千台不同的机器吗
  • Does not depend on high availability of individual nodes -- automatically handles node failure and replicates any data from downed nodes
  • 是否不依赖于单个节点的高可用性——自动处理节点故障并从已关闭的节点复制任何数据

As for the actual database... HBase, Mongo, or Cassandra are all good options here, pick whatever you are most comfortable with -- HDFS will take care of all of the replication for you.

至于实际的数据库……HBase、Mongo或Cassandra都是不错的选择,您可以选择您最喜欢的——HDFS将为您处理所有复制。

#2


3  

According to this SO response:

根据这一回应:

https://*.com/a/8787999/2020565

https://*.com/a/8787999/2020565

And upon cheking their website, maybe you should check Elliptics: http://www.ioremap.net/projects/elliptics/

并且在他们的网站上,也许你应该检查椭圆:http://www.ioremap.net/projects/椭球/。

The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage. Small to medium sized write benchmarks can be found on eblob page.

网络不使用专用服务器来维护元数据信息,它支持冗余对象存储。可以在eblob页面上找到中小型写基准。

#3


3  

In my experience Elasticsearch has great and easy-to-use cluster management, it supports out of the box nice features such as node autodiscovery, data replication, auto-rebalancing etc., have a look at docs. Usually it is used to replicate data from an other database to make it searchable but I don't see why it couldn't be used in this context as well.

根据我的经验,Elasticsearch有很好的、易于使用的集群管理,它支持开箱即用的优秀特性,如节点自动发现、数据复制、自动再平衡等。通常,它用于复制来自其他数据库的数据,使其可搜索,但我不明白为什么不能在此上下文中使用它。

Basically when you create a "table" (called "index" in ES) you get to decide that in how many "partitions" (called "shards") the data should be partitioned, and ad-hoc set that how many replicas of that table you want to have (this doesn't 100% match the correct terminology since an "index" can consist of multiple "types" but I think this is the best analogy).

基本上,当您创建一个“表”(称为“指数”在ES)你来决定,有多少“分区”(称为“碎片”)数据应该分区,和特别设置表你想要多少个副本(这并不100%匹配正确的术语,因为一个“索引”可以包含多个“类型”,但我认为这是最好的类比)。

An example project with three Pis is here.

这里有一个具有三个Pis的示例项目。

I have read a bit about Cassandra as well and I imagine it would have similar features, for example partitions and replicas are mentioned here.

我也读过一些关于Cassandra的文章,我想它应该有类似的特性,例如这里提到的分区和副本。

#4


2  

I'd recommend taking a look at Hazelcast. They do pretty good in memory replication across a cluster that might change. You'd have to write a custom client to store the data into a local database of your choice if you want disk backed persistence, but Hazelcast can take care of replication across a cluster in memory and has a lot of flexibility.

我建议你去看看榛子酱。它们在跨集群的内存复制方面做得很好,这可能会改变。如果需要磁盘支持的持久性,则必须编写一个自定义客户机将数据存储到您选择的本地数据库中,但是Hazelcast可以在内存中处理跨集群的复制,并且具有很大的灵活性。

#5


0  

  1. You should consider Erlang OTP platform and Mnesia database

    您应该考虑Erlang OTP平台和Mnesia数据库

  2. If you prefer C language you can consider SQlite in memory database together with nanomsg framework

    如果您喜欢C语言,可以考虑使用nanomsg框架在内存数据库中使用SQlite