MongoDB副本集的搭建

副本集是mongodb提供的一种高可用解决方案。相对于原来的主从复制，副本集能自动感知primary节点的下线，并提升其中一个Secondary作为Primary。

整个过程对业务透明，同时也大大降低了运维的成本。

架构图如下：

MongoDB副本集的搭建

MongoDB副本集的角色

1. Primary

默认情况下，读写都是在Primary上操作的。

2. Secondary

通过oplog来重放Primary上的所有操作，拥有Primary节点数据的完整拷贝。

默认情况下，不可写，也不可读。

根据不同的需求，Secondary又可配置为如下形式：

1> Priority 0 Replica Set Members

优先级为0的节点，优先级为0的成员永远不会被选举为primary。

在mongoDB副本集中，允许给不同的节点设置不同的优先级。

优先级的取值范围为0-1000，可设置为浮点数，默认为1。

拥有最高优先级的成员会优先选举为primary。

譬如，在副本集中添加了一个优先级为2的成员node3:27020，而其它成员的优先级为1，只要node3:27020拥有最新的数据，那么当前的primary就会自动降

级，node3:27020将会被选举为新的primary节点，但如果node3:27020中的数据不够新，则当前primary节点保持不变，直到node3:27020的数据更新到最新。

2> Hidden Replica Set Members-隐藏节点

隐藏节点的优先级同样为0，同时对客户端不可见

使用rs.status()和rs.config()可以看到隐藏节点，但是对于db.isMaster()不可见。客户端连接到副本集时，会调用db.isMaster()命令来查看可用成员信息。

所以，隐藏节点不会受到客户端的读请求。

隐藏节点常用于执行特定的任务，譬如报表，备份。

3> Delayed Replica Set Members-延迟节点

延迟节点会比primary节点延迟指定的时间（通过slaveDelay参数来指定）

延迟节点必须是隐藏节点。

3. Arbiter

仲裁节点，只是用来投票，且投票的权重只能为1，不复制数据，也不能提升为primary。

仲裁节点常用于节点数量是偶数的副本集中。

建议：通常将Arbiter部署在业务服务器上，切忌将其部署在Primary节点或Secondary节点服务器上。

注：一个副本集最多有50个成员节点，7个投票节点。

MongoDB副本集的搭建

创建数据目录

# mkdir -p /data/27017

# mkdir -p /data/27018

# mkdir -p /data/27019

为了便于查看运行过程中的日志信息，为每个实例创建单独的日志文件

# mkdir -p /var/log/mongodb/

启动mongod实例

# mongod --replSet myapp --dbpath /data/27017 --port 27017 --logpath /var/log/mongodb/27017.log --fork

# mongod --replSet myapp --dbpath /data/27018 --port 27018 --logpath /var/log/mongodb/27018.log --fork

# mongod --replSet myapp --dbpath /data/27019 --port 27019 --logpath /var/log/mongodb/27019.log --fork

以27017端口实例为例，其日志输出信息如下：

--02T14::22.745+ I CONTROL  [initandlisten] MongoDB starting : pid= port= dbpath=/data/ -bit host=node3

--02T14::22.745+ I CONTROL  [initandlisten] db version v3.4.2

--02T14::22.745+ I CONTROL  [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b

--02T14::22.745+ I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips  Feb

--02T14::22.745+ I CONTROL  [initandlisten] allocator: tcmalloc

--02T14::22.745+ I CONTROL  [initandlisten] modules: none

--02T14::22.745+ I CONTROL  [initandlisten] build environment:

--02T14::22.745+ I CONTROL  [initandlisten]     distmod: rhel62

--02T14::22.745+ I CONTROL  [initandlisten]     distarch: x86_64

--02T14::22.745+ I CONTROL  [initandlisten]     target_arch: x86_64

--02T14::22.745+ I CONTROL  [initandlisten] options: { net: { port:  }, processManagement: { fork: true }, replication: { replSet: "myapp" }, storage: { dbPath: "/data/27017" }, systemLog: { destination: "file", path: "/var/log/mongodb/27017.log" } }

--02T14::22.768+ I -        [initandlisten] --02T14::22.768+ I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine2017--02T14::22.768+ I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem

--02T14::22.769+ I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=256M,session_max=,eviction=(threads_max=),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=),checkpoint=(wait=,log_size=2GB),statistics_log=(wait=),--02T14::24.450+ I CONTROL  [initandlisten]

--02T14::24.482+ I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.

--02T14::24.482+ I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.

--02T14::24.482+ I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.--02T14::24.482+ I CONTROL  [initandlisten]

--02T14::24.516+ I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/27017/diagnostic.data'--02T14::24.517+ I REPL     [initandlisten] Did not find local voted for document at startup.

--02T14::24.517+ I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument: Did not find replica set configuration document in local.system.replset

--02T14::24.519+ I NETWORK  [thread1] waiting for connections on port

通过mongo连接副本集任一成员，在这里，连接27017端口实例

# mongo

初始化副本集

> rs.initiate()

{

    "info2" : "no configuration specified. Using a default configuration for the set",

    "me" : "node3:27017",

    "ok" :

}

可通过rs.conf()查看当前副本集的配置信息，

myapp:PRIMARY> rs.conf()

{

    "_id" : "myapp",

    "version" : ,

    "protocolVersion" : NumberLong(),

    "members" : [

        {

            "_id" : ,

            "host" : "node3:27017",

            "arbiterOnly" : false,

            "buildIndexes" : true,

            "hidden" : false,

            "priority" : ,

            "tags" : {

            },

            "slaveDelay" : NumberLong(),

            "votes" :

        }

    ],

    "settings" : {

        "chainingAllowed" : true,

        "heartbeatIntervalMillis" : ,

        "heartbeatTimeoutSecs" : ,

        "electionTimeoutMillis" : ,

        "catchUpTimeoutMillis" : ,

        "getLastErrorModes" : {

        },

        "getLastErrorDefaults" : {

            "w" : ,

            "wtimeout" :

        },

        "replicaSetId" : ObjectId("59082229517dd35bb9fd0d2a")

    }

}

其中，settings中的选项解释如下：

chainingAllowed：是否允许级联复制

heartbeatIntervalMillis：心跳检测时间，默认是2s

heartbeatTimeoutSecs：心跳检测失效时间，默认为10s，即如果在10s内没有收到节点的心跳信息，则判断节点不可达（HostUnreachable），对primary和Secondary均适用。

日志输出信息如下：

# vim /var/log/mongodb/27017.log

--02T14::47.361+ I NETWORK  [thread1] connection accepted from 127.0.0.1: # ( connection now open)

--02T14::47.361+ I NETWORK  [conn1] received client metadata from 127.0.0.1: conn1: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T14::36.737+ I COMMAND  [conn1] initiate : no configuration specified. Using a default configuration for the set

--02T14::36.737+ I COMMAND  [conn1] created this configuration for initiation : { _id: "myapp", version: , members: [ { _id: , host: "node3:27017" } ] }

--02T14::36.900+ I REPL     [conn1] replSetInitiate admin command received from client

--02T14::37.391+ I REPL     [conn1] replSetInitiate config object with  members parses ok

--02T14::37.410+ I REPL     [conn1] ******

--02T14::37.410+ I REPL     [conn1] creating replication oplog of size: 990MB...

--02T14::37.439+ I STORAGE  [conn1] Starting WiredTigerRecordStoreThread local.oplog.rs

--02T14::37.440+ I STORAGE  [conn1] The size storer reports that the oplog contains  records totaling to  bytes

--02T14::37.440+ I STORAGE  [conn1] Scanning the oplog to determine where to place markers for truncation

--02T14::37.472+ I REPL     [conn1] ******

--02T14::37.568+ I INDEX    [conn1] build index on: admin.system.version properties: { v: , key: { version:  }, name: "incompatible_with_version_32", ns: "admin.system.version" }

--02T14::37.568+ I INDEX    [conn1]      building index using bulk method; build may temporarily use up to  megabytes of RAM

--02T14::37.581+ I INDEX    [conn1] build index done.  scanned  total records.  secs

--02T14::37.591+ I COMMAND  [conn1] setting featureCompatibilityVersion to 3.4

--02T14::37.601+ I REPL     [conn1] New replica set config in use: { _id: "myapp", version: , protocolVersion: , members: [ { _id: , host: "node3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: , heartbeatTimeoutSecs: , electionTimeoutMillis: , catchUpTimeoutMillis: , getLastErrorModes: {}, getLastErrorDefaults: { w: , wtimeout:  }, replicaSetId: ObjectId('59082229517dd35bb9fd0d2a') } }

--02T14::37.601+ I REPL     [conn1] This node is node3: in the config

--02T14::37.601+ I REPL     [conn1] transition to STARTUP2

--02T14::37.601+ I REPL     [conn1] Starting replication storage threads

--02T14::37.603+ I REPL     [conn1] Starting replication fetcher thread

--02T14::37.617+ I REPL     [conn1] Starting replication applier thread

--02T14::37.617+ I REPL     [conn1] Starting replication reporter thread

--02T14::37.617+ I REPL     [rsSync] transition to RECOVERING

--02T14::37.628+ I REPL     [rsSync] transition to SECONDARY

--02T14::37.635+ I COMMAND  [conn1] command local.replset.minvalid appName: "MongoDB Shell" command: replSetInitiate { v: , key: { version:  }, ns: "admin.system.version", name: "incompatible_with_version_32" } numYields: reslen: locks:{ Global: { acquireCount: { r: , w: , W:  }, acquireWaitCount: { W:  }, timeAcquiringMicros: { W:  } }, Database: { acquireCount: { r: , w: , W:  } }, Collection: { acquireCount: { r: , w:  } }, Metadata: { acquireCount: { w:  } }, oplog: { acquireCount: { w:  } } } protocol:op_command 941ms

--02T14::37.646+ I REPL     [rsSync] conducting a dry run election to see if we could be elected

--02T14::37.646+ I REPL     [ReplicationExecutor] dry election run succeeded, running for election

--02T14::37.675+ I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term

--02T14::37.675+ I REPL     [ReplicationExecutor] transition to PRIMARY

--02T14::37.675+ I REPL     [ReplicationExecutor] Could not access any nodes within timeout when checking for additional ops to apply before finishing transition to primary. Will move forward with becoming primary anyway.

--02T14::38.687+ I REPL     [rsSync] transition to primary complete; database writes are now permitted

添加节点

myapp:PRIMARY> rs.add("node3:27018")

{ "ok" :  }

27017端口实例的日志信息如下：

--02T15::44.737+ I COMMAND  [conn1] command local.system.replset appName: "MongoDB Shell" command: count { count: "system.replset", query: {}, fields: {} } planSummary: COUNT keysExamined: docsExamined: numYields: reslen: locks:{ Global: { acquireCount: { r:  } }, Database: { acquireCount: { r:  } }, Collection: { acquireCount: { r:  } } } protocol:op_command 135ms

--02T15::44.765+ I REPL     [conn1] replSetReconfig admin command received from client

--02T15::44.808+ I REPL     [conn1] replSetReconfig config object with  members parses ok

--02T15::44.928+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T15::44.979+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T15::44.994+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T15::45.007+ I NETWORK  [conn3] received client metadata from 192.168.244.30: conn3: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T15::45.009+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T15::45.010+ I -        [conn4] end connection 192.168.244.30: ( connections now open)

--02T15::45.105+ I REPL     [ReplicationExecutor] New replica set config in use: { _id: "myapp", version: , protocolVersion: , members: [ { _id: , host: "node3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: , host: "node3:27018", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: , heartbeatTimeoutSecs: , electionTimeoutMillis: , catchUpTimeoutMillis: , getLastErrorModes: {}, getLastErrorDefaults: { w: , wtimeout:  }, replicaSetId: ObjectId('59082229517dd35bb9fd0d2a') } }

--02T15::45.105+ I REPL     [ReplicationExecutor] This node is node3: in the config

--02T15::45.155+ I REPL     [ReplicationExecutor] Member node3: is now in state STARTUP

--02T15::45.155+ I COMMAND  [conn1] command local.system.replset appName: "MongoDB Shell" command: replSetReconfig { replSetReconfig: { _id: "myapp", version: , protocolVersion: , members: [ { _id: , host: "node3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: 1.0, host: "node3:27018" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: , heartbeatTimeoutSecs: , electionTimeoutMillis: , catchUpTimeoutMillis: , getLastErrorModes: {}, getLastErrorDefaults: { w: , wtimeout:  }, replicaSetId: ObjectId('59082229517dd35bb9fd0d2a') } } } numYields: reslen: locks:{ Global: { acquireCount: { r: , w: , W:  } }, Database: { acquireCount: { w: , W:  } }, Metadata: { acquireCount: { w:  } }, oplog: { acquireCount: { w:  } } } protocol:op_command 403ms

--02T15::47.010+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T15::47.011+ I -        [conn5] end connection 192.168.244.30: ( connections now open)

--02T15::47.940+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T15::47.941+ I NETWORK  [conn6] received client metadata from 192.168.244.30: conn6: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T15::48.010+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T15::48.011+ I NETWORK  [conn7] received client metadata from 192.168.244.30: conn7: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T15::49.159+ I REPL     [ReplicationExecutor] Member node3: is now in state SECONDARY

--02T15::49.160+ I -        [conn6] end connection 192.168.244.30: ( connections now open)

--02T15::03.401+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T15::03.403+ I NETWORK  [conn8] received client metadata from 192.168.244.30: conn8: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

27018端口实例的日志信息如下：

--02T15::44.796+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connection now open)

--02T15::44.922+ I -        [conn2] end connection 192.168.244.30: ( connection now open)

--02T15::44.965+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connection now open)

--02T15::44.978+ I NETWORK  [conn3] received client metadata from 192.168.244.30: conn3: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T15::44.991+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T15::45.008+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T15::47.101+ I REPL     [replExecDBWorker-] Starting replication storage threads

--02T15::47.174+ I REPL     [replication-] Starting initial sync (attempt  of )

--02T15::47.174+ I REPL     [ReplicationExecutor] New replica set config in use: { _id: "myapp", version: , protocolVersion: , members: [ { _id: , host: "node3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: , host: "node3:27018", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: , heartbeatTimeoutSecs: , electionTimeoutMillis: , catchUpTimeoutMillis: , getLastErrorModes: {}, getLastErrorDefaults: { w: , wtimeout:  }, replicaSetId: ObjectId('59082229517dd35bb9fd0d2a') } }

--02T15::47.174+ I REPL     [ReplicationExecutor] This node is node3: in the config

--02T15::47.174+ I REPL     [ReplicationExecutor] transition to STARTUP2

--02T15::47.175+ I REPL     [ReplicationExecutor] Member node3: is now in state PRIMARY

--02T15::47.217+ I REPL     [replication-] sync source candidate: node3:

--02T15::47.217+ I STORAGE  [replication-] dropAllDatabasesExceptLocal

--02T15::47.217+ I REPL     [replication-] ******

--02T15::47.217+ I REPL     [replication-] creating replication oplog of size: 990MB...

--02T15::47.232+ I STORAGE  [replication-] Starting WiredTigerRecordStoreThread local.oplog.rs

--02T15::47.232+ I STORAGE  [replication-] The size storer reports that the oplog contains  records totaling to  bytes

--02T15::47.232+ I STORAGE  [replication-] Scanning the oplog to determine where to place markers for truncation

--02T15::47.938+ I REPL     [replication-] ******

--02T15::47.939+ I ASIO     [NetworkInterfaceASIO-RS-] Connecting to node3:

--02T15::47.941+ I ASIO     [NetworkInterfaceASIO-RS-] Successfully connected to node3:

--02T15::48.010+ I ASIO     [NetworkInterfaceASIO-RS-] Connecting to node3:

--02T15::48.011+ I ASIO     [NetworkInterfaceASIO-RS-] Successfully connected to node3:

--02T15::48.046+ I REPL     [replication-] CollectionCloner::start called, on ns:admin.system.version

--02T15::48.150+ I INDEX    [InitialSyncInserters-admin.system.version0] build index on: admin.system.version properties: { v: , key: { version:  }, name: "incompatible_with_version_32", ns: "admin.system.version" }

--02T15::48.150+ I INDEX    [InitialSyncInserters-admin.system.version0]      building index using bulk method; build may temporarily use up to  megabytes of RAM

--02T15::48.154+ I INDEX    [InitialSyncInserters-admin.system.version0] build index on: admin.system.version properties: { v: , key: { _id:  }, name: "_id_", ns: "admin.system.version" }

--02T15::48.155+ I INDEX    [InitialSyncInserters-admin.system.version0]      building index using bulk method; build may temporarily use up to  megabytes of RAM

--02T15::48.177+ I COMMAND  [InitialSyncInserters-admin.system.version0] setting featureCompatibilityVersion to 3.4

--02T15::48.221+ I REPL     [replication-] CollectionCloner::start called, on ns:test.blog

--02T15::48.264+ I INDEX    [InitialSyncInserters-test.blog0] build index on: test.blog properties: { v: , key: { _id:  }, name: "_id_", ns: "test.blog" }

--02T15::48.264+ I INDEX    [InitialSyncInserters-test.blog0]      building index using bulk method; build may temporarily use up to  megabytes of RAM

--02T15::48.271+ I REPL     [replication-] No need to apply operations. (currently at { : Timestamp | })

--02T15::48.271+ I REPL     [replication-] Finished fetching oplog during initial sync: CallbackCanceled: Callback canceled. Last fetched optime and hash: { ts: Timestamp |, t:  }[]

--02T15::48.271+ I REPL     [replication-] Initial sync attempt finishing up.

--02T15::48.271+ I REPL     [replication-] Initial Sync Attempt Statistics: { failedInitialSyncAttempts: , maxFailedInitialSyncAttempts: , initialSyncStart: new Date(), initialSyncAttempts: [], fetchedMissingDocs: , appliedOps: , initialSyncOplogStart: Timestamp |, initialSyncOplogEnd: Timestamp |, databases: { databasesCloned: , admin: { collections: , clonedCollections: , start: new Date(), end: new Date(), elapsedMillis: , admin.system.version: { documentsToCopy: , documentsCopied: , indexes: , fetchedBatches: , start: new Date(), end: new Date(), elapsedMillis:  } }, test: { collections: , clonedCollections: , start: new Date(), end: new Date(), elapsedMillis: , test.blog: { documentsToCopy: , documentsCopied: , indexes: , fetchedBatches: , start: new Date(), end: new Date(), elapsedMillis:  } } } }

--02T15::48.352+ I REPL     [replication-] initial sync done; took 1s.

--02T15::48.352+ I REPL     [replication-] Starting replication fetcher thread

--02T15::48.352+ I REPL     [replication-] Starting replication applier thread

--02T15::48.352+ I REPL     [replication-] Starting replication reporter thread

--02T15::48.352+ I REPL     [rsSync] transition to RECOVERING

--02T15::48.366+ I REPL     [rsBackgroundSync] could not find member to sync from

--02T15::48.367+ I REPL     [rsSync] transition to SECONDARY

--02T15::03.392+ I REPL     [rsBackgroundSync] sync source candidate: node3:

--02T15::03.396+ I ASIO     [NetworkInterfaceASIO-RS-] Connecting to node3:

--02T15::03.404+ I ASIO     [NetworkInterfaceASIO-RS-] Successfully connected to node3:

添加仲裁节点

myapp:PRIMARY> rs.addArb("node3:27019")

{ "ok" :  }

27017端口实例的日志信息如下：

--02T16::59.098+ I REPL     [conn1] replSetReconfig admin command received from client

--02T16::59.116+ I REPL     [conn1] replSetReconfig config object with  members parses ok

--02T16::59.116+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T16::59.123+ I REPL     [ReplicationExecutor] New replica set config in use: { _id: "myapp", version: , protocolVersion: , members: [ { _id: , host: "node3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: , host: "node3:27018", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: , host: "node3:27019", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: , heartbeatTimeoutSecs: , electionTimeoutMillis: , catchUpTimeoutMillis: , getLastErrorModes: {}, getLastErrorDefaults: { w: , wtimeout:  }, replicaSetId: ObjectId('59082229517dd35bb9fd0d2a') } }

--02T16::59.123+ I REPL     [ReplicationExecutor] This node is node3: in the config

--02T16::59.124+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T16::59.124+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T16::59.125+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T16::59.127+ I -        [conn9] end connection 192.168.244.30: ( connections now open)

--02T16::59.131+ I REPL     [ReplicationExecutor] Member node3: is now in state STARTUP

--02T16::59.137+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T16::59.223+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T16::59.225+ I NETWORK  [conn10] received client metadata from 192.168.244.30: conn10: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T16::59.231+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T16::59.232+ I -        [conn11] end connection 192.168.244.30: ( connections now open)

--02T16::01.132+ I REPL     [ReplicationExecutor] Member node3: is now in state ARBITER

27019端口实例的日志信息如下：

--02T16::59.115+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connection now open)

--02T16::59.117+ I -        [conn1] end connection 192.168.244.30: ( connection now open)

--02T16::59.117+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connection now open)

--02T16::59.122+ I NETWORK  [conn2] received client metadata from 192.168.244.30: conn2: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T16::59.125+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T16::59.127+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T16::59.128+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T16::59.135+ I -        [conn4] end connection 192.168.244.30: ( connections now open)

--02T16::59.136+ I NETWORK  [conn3] received client metadata from 192.168.244.30: conn3: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T16::59.214+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T16::59.216+ I NETWORK  [conn5] received client metadata from 192.168.244.30: conn5: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T16::59.219+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T16::59.227+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T16::59.227+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T16::59.295+ I REPL     [ReplicationExecutor] New replica set config in use: { _id: "myapp", version: , protocolVersion: , members: [ { _id: , host: "node3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: , host: "node3:27018", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: , host: "node3:27019", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: , heartbeatTimeoutSecs: , electionTimeoutMillis: , catchUpTimeoutMillis: , getLastErrorModes: {}, getLastErrorDefaults: { w: , wtimeout:  }, replicaSetId: ObjectId('59082229517dd35bb9fd0d2a') } }

--02T16::59.295+ I REPL     [ReplicationExecutor] This node is node3: in the config

--02T16::59.295+ I REPL     [ReplicationExecutor] transition to ARBITER

--02T16::59.297+ I REPL     [ReplicationExecutor] Member node3: is now in state PRIMARY

--02T16::59.297+ I REPL     [ReplicationExecutor] Member node3: is now in state SECONDARY

--02T16::59.132+ I -        [conn2] end connection 192.168.244.30: ( connections now open)

检查复制集的状态

myapp:PRIMARY> rs.status()

{

    "set" : "myapp",

    "date" : ISODate("2017-05-02T08:10:59.174Z"),

    "myState" : ,

    "term" : NumberLong(),

    "heartbeatIntervalMillis" : NumberLong(),

    "optimes" : {

        "lastCommittedOpTime" : {

            "ts" : Timestamp(, ),

            "t" : NumberLong()

        },

        "appliedOpTime" : {

            "ts" : Timestamp(, ),

            "t" : NumberLong()

        },

        "durableOpTime" : {

            "ts" : Timestamp(, ),

            "t" : NumberLong()

        }

    },

    "members" : [

        {

            "_id" : ,

            "name" : "node3:27017",

            "health" : ,

            "state" : ,

            "stateStr" : "PRIMARY",

            "uptime" : ,

            "optime" : {

                "ts" : Timestamp(, ),

                "t" : NumberLong()

            },

            "optimeDate" : ISODate("2017-05-02T08:10:49Z"),

            "electionTime" : Timestamp(, ),

            "electionDate" : ISODate("2017-05-02T06:07:37Z"),

            "configVersion" : ,

            "self" : true

        },

        {

            "_id" : ,

            "name" : "node3:27018",

            "health" : ,

            "state" : ,

            "stateStr" : "SECONDARY",

            "uptime" : ,

            "optime" : {

                "ts" : Timestamp(, ),

                "t" : NumberLong()

            },

            "optimeDurable" : {

                "ts" : Timestamp(, ),

                "t" : NumberLong()

            },

            "optimeDate" : ISODate("2017-05-02T08:10:49Z"),

            "optimeDurableDate" : ISODate("2017-05-02T08:10:49Z"),

            "lastHeartbeat" : ISODate("2017-05-02T08:10:57.606Z"),

            "lastHeartbeatRecv" : ISODate("2017-05-02T08:10:58.224Z"),

            "pingMs" : NumberLong(),

            "syncingTo" : "node3:27017",

            "configVersion" :

        },

        {

            "_id" : ,

            "name" : "node3:27019",

            "health" : ,

            "state" : ,

            "stateStr" : "ARBITER",

            "uptime" : ,

            "lastHeartbeat" : ISODate("2017-05-02T08:10:57.607Z"),

            "lastHeartbeatRecv" : ISODate("2017-05-02T08:10:54.391Z"),

            "pingMs" : NumberLong(),

            "configVersion" :

        }

    ],

    "ok" :

}

副本集也可通过配置文件的方式进行创建

> cfg={

...     "_id":"myapp",

...     "members":[

...         {"_id":,"host":"node3:27017"},

...         {"_id":,"host":"node3:27018"},

...         {"_id":,"host":"node3:27019","arbiterOnly" : true}

... ]}

> rs.initiate(cfg)

验证副本集的可用性

在primary中创建一个集合，并插入一个文档进行测试

# mongo

myapp:PRIMARY> show dbs;

admin  .000GB

local  .000GB

myapp:PRIMARY> use test

switched to db test

myapp:PRIMARY> db.blog.insert({"title":"My Blog Post"})

WriteResult({ "nInserted" :  })

myapp:PRIMARY> db.blog.find();

{ "_id" : ObjectId("59082731008c534e0763e90a"), "title" : "My Blog Post" }

myapp:PRIMARY> quit()

在secondary中进行验证

# mongo --port

myapp:SECONDARY> use test

switched to db test

myapp:SECONDARY> db.blog.find()

Error: error: {

    "ok" : ,

    "errmsg" : "not master and slaveOk=false",

    "code" : ,

    "codeName" : "NotMasterNoSlaveOk"

}

myapp:SECONDARY> rs.slaveOk()

myapp:SECONDARY> db.blog.find()

{ "_id" : ObjectId("59082731008c534e0763e90a"), "title" : "My Blog Post" }

myapp:SECONDARY> quit()

因仲裁节点实际上并不存储任何数据，所以无法通过连接仲裁节点查看刚刚插入的文档

# mongo --port

myapp:ARBITER> use test

switched to db test

myapp:ARBITER> db.blog.find();

Error: error: {

    "ok" : ,

    "errmsg" : "not master and slaveOk=false",

    "code" : ,

    "codeName" : "NotMasterNoSlaveOk"

}

myapp:ARBITER> rs.slaveOk()

myapp:ARBITER> db.blog.find()

Error: error: {

    "ok" : ,

    "errmsg" : "node is not in primary or recovering state",

    "code" : ,

    "codeName" : "NotMasterOrSecondary"

}

myapp:ARBITER> quit()

模拟primary宕掉后，副本集的自动切换

# ps -ef |grep mongodb

root                : ?        :: mongod --replSet myapp --dbpath /data/ --port  --logpath /var/log/mongodb

/.log --forkroot                : ?        :: mongod --replSet myapp --dbpath /data/ --port  --logpath /var/log/mongodb

/.log --forkroot                : ?        :: mongod --replSet myapp --dbpath /data/ --port  --logpath /var/log/mongodb

/.log --forkroot             : pts/    :: vim /var/log/mongodb/.log

root             : pts/    :: tailf /var/log/mongodb/.log

root             : pts/    :: tailf /var/log/mongodb/.log

root             : pts/    :: grep mongodb

# kill -

检查复制集的状态

在这里，连接27018端口实例

# mongo --port 27018

myapp:PRIMARY> db.isMaster()

{

    "hosts" : [

        "node3:27017",

        "node3:27018"

    ],

    "arbiters" : [

        "node3:27019"

    ],

    "setName" : "myapp",

    "setVersion" : ,

    "ismaster" : true,

    "secondary" : false,

    "primary" : "node3:27018",

    "me" : "node3:27018",

    "electionId" : ObjectId("7fffffff0000000000000002"),

    "lastWrite" : {

        "opTime" : {

            "ts" : Timestamp(, ),

            "t" : NumberLong()

        },

        "lastWriteDate" : ISODate("2017-05-02T09:19:02Z")

    },

    "maxBsonObjectSize" : ,

    "maxMessageSizeBytes" : ,

    "maxWriteBatchSize" : ,

    "localTime" : ISODate("2017-05-02T09:19:04.870Z"),

    "maxWireVersion" : ,

    "minWireVersion" : ,

    "readOnly" : false,

    "ok" :

}

可见，primary已经切换到27018端口实例上了。

对应的，27018端口实例的日志输出信息如下：

--02T17::51.853+ I -        [conn3] end connection 192.168.244.30: ( connections now open)

--02T17::51.853+ I REPL     [replication-] Restarting oplog query due to error: HostUnreachable: End of file. Last fetched optime (with hash): { ts: Timestamp |, t:  }[-]. Restarts remaining:

--02T17::51.878+ I ASIO     [replication-] dropping unhealthy pooled connection to node3:

--02T17::51.878+ I ASIO     [replication-] after drop, pool was empty, going to spawn some connections

--02T17::51.879+ I REPL     [replication-] Scheduled new oplog query Fetcher source: node3: database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp | } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: , term:  } query metadata: { $replData: , $ssm: { $secondaryOk: true } } active:  timeout: 10000ms shutting down?:  first:  firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand  -- target:node3: db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp | } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: , term:  } active:  callbackHandle.valid:  callbackHandle.cancelled:  attempt:  retryPolicy: RetryPolicyImpl maxAttempts:  maxTimeMillis: -1ms

--02T17::51.879+ I ASIO     [NetworkInterfaceASIO-RS-] Connecting to node3:

--02T17::51.879+ I ASIO     [NetworkInterfaceASIO-RS-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::51.880+ I REPL     [replication-] Restarting oplog query due to error: HostUnreachable: Connection refused. Last fetched optime (with hash): { ts: Timestamp |, t:  }[-]. Restarts remaining:

--02T17::51.880+ I REPL     [replication-] Scheduled new oplog query Fetcher source: node3: database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp | } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: , term:  } query metadata: { $replData: , $ssm: { $secondaryOk: true } } active:  timeout: 10000ms shutting down?:  first:  firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand  -- target:node3: db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp | } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: , term:  } active:  callbackHandle.valid:  callbackHandle.cancelled:  attempt:  retryPolicy: RetryPolicyImpl maxAttempts:  maxTimeMillis: -1ms

--02T17::51.880+ I ASIO     [NetworkInterfaceASIO-RS-] Connecting to node3:

--02T17::51.880+ I ASIO     [NetworkInterfaceASIO-RS-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::51.880+ I REPL     [replication-] Restarting oplog query due to error: HostUnreachable: Connection refused. Last fetched optime (with hash): { ts: Timestamp |, t:  }[-]. Restarts remaining:

--02T17::51.880+ I REPL     [replication-] Scheduled new oplog query Fetcher source: node3: database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp | } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: , term:  } query metadata: { $replData: , $ssm: { $secondaryOk: true } } active:  timeout: 10000ms shutting down?:  first:  firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand  -- target:node3: db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp | } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: , term:  } active:  callbackHandle.valid:  callbackHandle.cancelled:  attempt:  retryPolicy: RetryPolicyImpl maxAttempts:  maxTimeMillis: -1ms

--02T17::51.880+ I ASIO     [NetworkInterfaceASIO-RS-] Connecting to node3:

--02T17::51.883+ I ASIO     [NetworkInterfaceASIO-RS-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::51.884+ I REPL     [replication-] Error returned from oplog query (no more query restarts left): HostUnreachable: Connection refused

--02T17::51.884+ W REPL     [rsBackgroundSync] Fetcher stopped querying remote oplog with error: HostUnreachable: Connection refused

--02T17::51.884+ I REPL     [rsBackgroundSync] could not find member to sync from

--02T17::51.884+ I ASIO     [ReplicationExecutor] dropping unhealthy pooled connection to node3:

--02T17::51.884+ I ASIO     [ReplicationExecutor] after drop, pool was empty, going to spawn some connections

--02T17::51.884+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::51.885+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::51.885+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::51.885+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::51.885+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::51.885+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::51.885+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::51.885+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::51.886+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::54.837+ I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to node3:: InvalidSyncSource: Sync source was cleared. Was node3:

--02T17::56.886+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::56.886+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::56.886+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::56.886+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::56.887+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::56.887+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::56.887+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::56.887+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::56.887+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::01.560+ I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms

--02T17::01.605+ I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected

--02T17::01.616+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.626+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::01.630+ I REPL     [ReplicationExecutor] VoteRequester(term  dry run) failed to receive response from node3:: HostUnreachable: Connection refused

--02T17::01.637+ I REPL     [ReplicationExecutor] VoteRequester(term  dry run) received a yes vote from node3:; response message: { term: , voteGranted: true, reason: "", ok: 1.0 }

--02T17::01.638+ I REPL     [ReplicationExecutor] dry election run succeeded, running for election

--02T17::01.670+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.670+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::01.672+ I REPL     [ReplicationExecutor] VoteRequester(term ) failed to receive response from node3:: HostUnreachable: Connection refused

--02T17::01.689+ I REPL     [ReplicationExecutor] VoteRequester(term ) received a yes vote from node3:; response message: { term: , voteGranted: true, reason: "", ok: 1.0 }

--02T17::01.689+ I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term

--02T17::01.689+ I REPL     [ReplicationExecutor] transition to PRIMARY

--02T17::01.691+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.692+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.692+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.693+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::01.693+ I REPL     [ReplicationExecutor] My optime is most up-to-date, skipping catch-up and completing transition to primary.

--02T17::01.693+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::01.693+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::01.693+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.694+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::01.694+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::01.694+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.694+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::01.694+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection refused

--02T17::01.694+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T17::02.094+ I REPL     [rsSync] transition to primary complete; database writes are now permitted

从日志输出中可以看出，

在第一次探测到primary不可用时，mongodb会剔除掉不健康连接（dropping unhealthy pooled connection to node3:27017），然后继续探测，直到到达10s（heartbeatTimeoutSecs）的限制，此时进行primary的自动切换。

--02T17::01.560+ I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms

--02T17::01.605+ I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected

--02T17::01.616+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.626+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::01.630+ I REPL     [ReplicationExecutor] VoteRequester(term  dry run) failed to receive response from node3:: HostUnreachable: Connection refused

--02T17::01.637+ I REPL     [ReplicationExecutor] VoteRequester(term  dry run) received a yes vote from node3:; response message: { term: , voteGranted: true, reason: "", ok: 1.0 }

--02T17::01.638+ I REPL     [ReplicationExecutor] dry election run succeeded, running for election

--02T17::01.670+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::01.670+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Connection refused

--02T17::01.672+ I REPL     [ReplicationExecutor] VoteRequester(term ) failed to receive response from node3:: HostUnreachable: Connection refused

--02T17::01.689+ I REPL     [ReplicationExecutor] VoteRequester(term ) received a yes vote from node3:; response message: { term: , voteGranted: true, reason: "", ok: 1.0 }

--02T17::01.689+ I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term

--02T17::01.689+ I REPL     [ReplicationExecutor] transition to PRIMARY

实际上，在27017端口实例宕掉的过程中，其它两个节点均会继续针对27017端口实例进行心跳检测

--02T17::08.384+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::08.384+ I ASIO     [NetworkInterfaceASIO-Replication-] Failed to connect to node3: - HostUnreachable: Conn

ection refused
2017--02T17::08.384+ I REPL     [ReplicationExecutor] Error in heartbeat request to node3:; HostUnreachable: Connection

refused

当27017端口实例重新上线时，会自动以Secondary角色加入到副本集中

27017端口实例启动并重新加入副本集的日志信息输出如下：

--02T17::10.616+ I CONTROL  [initandlisten] MongoDB starting : pid= port= dbpath=/data/ -bit host=node3

--02T17::10.616+ I CONTROL  [initandlisten] db version v3.4.2

--02T17::10.616+ I CONTROL  [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b

--02T17::10.616+ I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips  Feb

--02T17::10.616+ I CONTROL  [initandlisten] allocator: tcmalloc

--02T17::10.616+ I CONTROL  [initandlisten] modules: none

--02T17::10.616+ I CONTROL  [initandlisten] build environment:

--02T17::10.616+ I CONTROL  [initandlisten]     distmod: rhel62

--02T17::10.616+ I CONTROL  [initandlisten]     distarch: x86_64

--02T17::10.616+ I CONTROL  [initandlisten]     target_arch: x86_64

--02T17::10.616+ I CONTROL  [initandlisten] options: { net: { port:  }, processManagement: { fork: true }, replication: { replSet: "myapp" }, storage: { dbPath: "/data/27017" }, systemLog: { destination: "file", path: "/var/log/mongodb/27017.log" } }

--02T17::10.616+ W -        [initandlisten] Detected unclean shutdown - /data//mongod.lock is not empty.

--02T17::10.645+ I -        [initandlisten] Detected data files in /data/ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.

--02T17::10.645+ W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.

--02T17::10.645+ I STORAGE  [initandlisten]

--02T17::10.645+ I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine

--02T17::10.645+ I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem

--02T17::10.645+ I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=256M,session_max=,eviction=(threads_max=),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=),checkpoint=(wait=,log_size=2GB),statistics_log=(wait=),

--02T17::11.402+ I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs

--02T17::11.436+ I STORAGE  [initandlisten] The size storer reports that the oplog contains  records totaling to  bytes

--02T17::11.436+ I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation

--02T17::11.502+ I CONTROL  [initandlisten]

--02T17::11.502+ I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.

--02T17::11.502+ I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.

--02T17::11.502+ I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.

--02T17::11.502+ I CONTROL  [initandlisten]

--02T17::11.675+ I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/27017/diagnostic.data'

--02T17::11.744+ I NETWORK  [thread1] waiting for connections on port

--02T17::11.797+ I REPL     [replExecDBWorker-] New replica set config in use: { _id: "myapp", version: , protocolVersion: , members: [ { _id: , host: "node3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: , host: "node3:27018", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  }, { _id: , host: "node3:27019", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: , votes:  } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: , heartbeatTimeoutSecs: , electionTimeoutMillis: , catchUpTimeoutMillis: , getLastErrorModes: {}, getLastErrorDefaults: { w: , wtimeout:  }, replicaSetId: ObjectId('59082229517dd35bb9fd0d2a') } }

--02T17::11.797+ I REPL     [replExecDBWorker-] This node is node3: in the config

--02T17::11.797+ I REPL     [replExecDBWorker-] transition to STARTUP2

--02T17::11.797+ I REPL     [replExecDBWorker-] Starting replication storage threads

--02T17::11.798+ I REPL     [replExecDBWorker-] Starting replication fetcher thread

--02T17::11.798+ I REPL     [replExecDBWorker-] Starting replication applier thread

--02T17::11.798+ I REPL     [replExecDBWorker-] Starting replication reporter thread

--02T17::11.799+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::11.799+ I ASIO     [NetworkInterfaceASIO-Replication-] Connecting to node3:

--02T17::11.799+ I REPL     [rsSync] transition to RECOVERING

--02T17::11.801+ I REPL     [rsSync] transition to SECONDARY

--02T17::11.801+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T17::11.801+ I ASIO     [NetworkInterfaceASIO-Replication-] Successfully connected to node3:

--02T17::11.802+ I REPL     [ReplicationExecutor] Member node3: is now in state ARBITER

--02T17::11.803+ I REPL     [ReplicationExecutor] Member node3: is now in state PRIMARY

--02T17::12.116+ I FTDC     [ftdc] Unclean full-time diagnostic data capture shutdown detected, found interim file, some metrics may have been lost. OK

--02T17::12.388+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connection now open)

--02T17::12.390+ I NETWORK  [conn1] received client metadata from 192.168.244.30: conn1: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T17::15.744+ I NETWORK  [thread1] connection accepted from 192.168.244.30: # ( connections now open)

--02T17::15.745+ I NETWORK  [conn2] received client metadata from 192.168.244.30: conn2: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.2" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.7 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-573.el6.x86_64" } }

--02T17::17.802+ I REPL     [rsBackgroundSync] sync source candidate: node3:

--02T17::17.873+ I ASIO     [NetworkInterfaceASIO-RS-] Connecting to node3:

--02T17::17.875+ I ASIO     [NetworkInterfaceASIO-RS-] Successfully connected to node3:

--02T17::18.203+ I ASIO     [NetworkInterfaceASIO-RS-] Connecting to node3:

--02T17::18.211+ I ASIO     [NetworkInterfaceASIO-RS-] Successfully connected to node3:

参考

1. 《MongoDB实战》

2. 《MongoDB权威指南》

3. 官方文档

秒客网

MongoDB副本集的搭建

相关文章