参考:
Paxos vs. Viewstamped Replication vs. Zab
Zab: High-performance broadcast for primary-backup systems
zookeeper 使用Zab(zookeeper atom broadcast).
Zab集群机器越多,写性能会有所降低、读性能得到水平扩展。从Follower直接读取数据,随不保证最新,但最终会读到最新的,但在其应用领域配置、分布式事务等业务上看已经是强一致性了。
为啥用Zab 而不是paxos?
从Zookeeper 提供API看,写操作需要先获取Txid,写冲突由业务层重新获取Txid重试,也就是说每个操作都包含隐形事务性,有事务让zookeeper有更多的应用场景。这也就要求同步协议保证因果顺序性。paxos是无法保证多个写之间因果顺序,要实现的话只能串行执行,效率低而不可行。
当然也可基于paxos,通过多个操作,从业务层面上实现zookeeper事务功能,但zab 这么设计要高效很多。
具体下面描述更加清楚:
Zab is a different protocol than Paxos, although it shares with it some key aspects, as for example:
- A leader proposes values to the followers
- Leaders wait for acknowledgements from a quorum of followers before considering a proposal committed (learned)
- Proposals include epoch numbers, which are similar to ballot numbers in Paxos
The main conceptual difference between Zab and Paxos is that it is primarily designed for primary-backup systems, like Zookeeper, rather than for state machine replication.
Paxos can be used for primary-backup replication by letting the primary be the leader. The problem with Paxos is that, if a primary concurrentlyproposes multiple state updates and fails, the new primary may apply uncommitted updates in an incorrect order. An example is presented in our DSN 2011 paper(Figure 1). In the example, a replica should only apply the state update B after applying A. The example shows that, using Paxos, a new primary and its follows may apply B after C, reaching an incorrect state that has not been reached by any of the previous primaries.
A workaround to this problem using Paxos is to sequentially agree on state updates: a primary proposes a state update only after it commits all previous state updates. Since there is at most one uncommitted update at a time, a new primary cannot incorrectly reorder updates. This approach, however, results in poor performance.
Zab does not need this workaround. Zab replicas can concurrently agree on the order of multiple state updates without harming correctness. This is achieved by adding one more synchronization phase during recovery compared to Paxos, and by using a different numbering of instances based on zxids.
Chubby VS Zookeeper:
zookeeper 比chubby提供更强数据一致性性(因果顺序),写性能会差一些。