
时间:2024-04-07 12:04:18


1,1 becomeActiveMaster(startupStatus);

1.2 finishInitialization

1.3 loop()

// We are either the active master or we were asked to shutdown
if (!this.stopped) {
finishInitialization(startupStatus, false);


2.1.1首先创建一个ActiveMasterManager,负责watch zk上的事件,这里主要是nodeCreated()。nodeDeleted()

  private boolean becomeActiveMaster(MonitoredTask startupStatus)
throws InterruptedException {
// TODO: This is wrong!!!! Should have new servername if we restart ourselves,
// if we come back to life.
this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,
stallIfBackupMaster(this.conf, this.activeMasterManager); // The ClusterStatusTracker is setup before the other
// ZKBasedSystemTrackers because it's needed by the activeMasterManager
// to check if the cluster should be shutdown.
this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus);

2.1.2 ActiveMasterManager上的事件处理这里不管是create还是都是delete节点都是一样的处 nodeCreated()与nodeDeleted事件处理

public void nodeCreated(String path) {
} @Override
public void nodeDeleted(String path) {
if(path.equals(watcher.clusterStateZNode) && !master.isStopped()) {
} handle(path);
} void handle(final String path) {
if (path.equals(watcher.getMasterAddressZNode()) && !master.isStopped()) {
}。终于的节点创建和删除处理函数,这里无论是创建还是删除节点都是同一处理函数,假设/hbase/master节点处在说明已经有active master了,另外这个 clusterHasActiveMaster.notifyAll();须要关注下,在后面的堵塞成为master会用到

  private void handleMasterNodeChange() {
// Watch the node and check if it exists.
try {
synchronized(clusterHasActiveMaster) {
if (ZKUtil.watchAndCheckExists(watcher, watcher.getMasterAddressZNode())) {
// A master node exists, there is an active master
LOG.debug("A master is now available");
} else {
// Node is no longer there, cluster does not have an active master
LOG.debug("No master available. Notifying waiting threads");
// Notify any thread waiting to become the active master
} catch (KeeperException ke) {
master.abort("Received an unexpected KeeperException, aborting", ke);


 boolean blockUntilBecomingActiveMaster(MonitoredTask startupStatus) {
while (true) {
startupStatus.setStatus("Trying to register in ZK as active master");
// Try to become the active master, watch if there is another master.
// Write out our ServerName as versioned bytes.
try {
//backupZNode -->/hbase/backup-masters/sn(hostname,port,startcode)
String backupZNode =
ZKUtil.joinZNode(this.watcher.backupMasterAddressesZNode, this.sn.toString());
// watcher.getMasterAddressZNode()-->/hbase/master
if (MasterAddressTracker.setMasterAddress(this.watcher,
this.watcher.getMasterAddressZNode(), this.sn)) { // If we were a backup master before, delete our ZNode from the backup
// master directory since we are the active now)
if (ZKUtil.checkExists(this.watcher, backupZNode) != -1) {
LOG.info("Deleting ZNode for " + backupZNode + " from backup master directory");
ZKUtil.deleteNodeFailSilent(this.watcher, backupZNode);
// Save the znode in a file, this will allow to check if we crash in the launch scripts
ZNodeClearer.writeMyEphemeralNodeOnDisk(this.sn.toString()); // We are the master, return
startupStatus.setStatus("Successfully registered as active master.");
LOG.info("Registered Active Master=" + this.sn);
return true;
} // There is another active master running elsewhere or this is a restart
// and the master ephemeral node has not expired yet.
this.clusterHasActiveMaster.set(true); /*
* Add a ZNode for ourselves in the backup master directory since we are
* not the active master.
* If we become the active master later, ActiveMasterManager will delete
* this node explicitly. If we crash before then, ZooKeeper will delete
* this node for us since it is ephemeral.
LOG.info("Adding ZNode for " + backupZNode + " in backup master directory");
MasterAddressTracker.setMasterAddress(this.watcher, backupZNode, this.sn); String msg;
byte[] bytes =
ZKUtil.getDataAndWatch(this.watcher, this.watcher.getMasterAddressZNode());
if (bytes == null) {
msg = ("A master was detected, but went down before its address " +
"could be read. Attempting to become the next active master");
} else {
ServerName currentMaster;
try {
currentMaster = ServerName.parseFrom(bytes);
} catch (DeserializationException e) {
LOG.warn("Failed parse", e);
// Hopefully next time around we won't fail the parse. Dangerous.
if (ServerName.isSameHostnameAndPort(currentMaster, this.sn)) {
msg = ("Current master has this master's address, " +
currentMaster + "; master was restarted? Deleting node.");
// Hurry along the expiration of the znode.
ZKUtil.deleteNode(this.watcher, this.watcher.getMasterAddressZNode()); // We may have failed to delete the znode at the previous step, but
// we delete the file anyway: a second attempt to delete the znode is likely to fail again.
} else {
msg = "Another master is the active master, " + currentMaster +
"; waiting to become the next active master";
} catch (KeeperException ke) {
master.abort("Received an unexpected KeeperException, aborting", ke);
return false;
synchronized (this.clusterHasActiveMaster) {
while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
try {
} catch (InterruptedException e) {
// We expect to be interrupted when a master dies,
// will fall out if so
LOG.debug("Interrupted waiting for master to die", e);
if (clusterShutDown.get()) {
"Cluster went down before this master became active");
if (this.master.isStopped()) {
return false;
// there is no active master so we can try to become active master again

2.2.1 创建暂时节点/hbase/master,这里主要看

this.watcher.getMasterAddressZNode(), this.sn)

  public static boolean setMasterAddress(final ZooKeeperWatcher zkw,
final String znode, final ServerName master)
throws KeeperException {
return ZKUtil.createEphemeralNodeAndWatch(zkw, znode, toByteArray(master));

  public static boolean createEphemeralNodeAndWatch(ZooKeeperWatcher zkw,
String znode, byte [] data)
throws KeeperException {
try {
zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode),
} catch (KeeperException.NodeExistsException nee) {
if(!watchAndCheckExists(zkw, znode)) {
// It did exist but now it doesn't, try again
return createEphemeralNodeAndWatch(zkw, znode, data);
return false;
} catch (InterruptedException e) {
LOG.info("Interrupted", e);
return true;


假设失败,说明不是active master,增加backup节点

LOG.info("Adding ZNode for " + backupZNode + " in backup master directory");
MasterAddressTracker.setMasterAddress(this.watcher, backupZNode, this.sn); 再次从zk获取master 的server地址。与自己比較。假设是则说明已经重新启动过

    byte[] bytes =
ZKUtil.getDataAndWatch(this.watcher, this.watcher.getMasterAddressZNode());
if (bytes == null) {
msg = ("A master was detected, but went down before its address " +
"could be read. Attempting to become the next active master");
} else {
ServerName currentMaster;
try {
currentMaster = ServerName.parseFrom(bytes);
} catch (DeserializationException e) {
LOG.warn("Failed parse", e);
// Hopefully next time around we won't fail the parse. Dangerous.
if (ServerName.isSameHostnameAndPort(currentMaster, this.sn)) {
msg = ("Current master has this master's address, " +
currentMaster + "; master was restarted? Deleting node.");
// Hurry along the expiration of the znode.
ZKUtil.deleteNode(this.watcher, this.watcher.getMasterAddressZNode()); // We may have failed to delete the znode at the previous step, but
// we delete the file anyway: a second attempt to delete the znode is likely to fail again.
} else {
msg = "Another master is the active master, " + currentMaster +
"; waiting to become the next active master";

2.2 堵塞在clusterHasActiveMaster,这里等待知道notify。由ActiveMasterManager(2.1.2)来触发

     synchronized (this.clusterHasActiveMaster) {
while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
try {
} catch (InterruptedException e) {
// We expect to be interrupted when a master dies,
// will fall out if so
LOG.debug("Interrupted waiting for master to die", e);
if (clusterShutDown.get()) {
"Cluster went down before this master became active");
if (this.master.isStopped()) {
return false;
// there is no active master so we can try to become active master again