Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

当前集群

主机名称	IP地址	角色	统一安装目录	统一安装用户
sht-sgmhadoopnn-01	172.16.101.55	namenode,resourcemanager	/usr/local/hadoop(软连接) /usr/local/hadoop-2.7.4 /usr/local/zookeeper（软连接） /usr/local/zookeeper-3.4.9	root
sht-sgmhadoopnn-02	172.16.101.56	namenode,resourcemanager
sht-sgmhadoopdn-01	172.16.101.58	datanode,nodemanager,journalnode,zookeeper
sht-sgmhadoopdn-02	172.16.101.59	datanode,nodemanager,journalnode,zookeeper
sht-sgmhadoopdn-03	172.16.101.60	datanode,nodemanager,journalnode,zookeeper

集群部署完成后增加datanode sht-sgmhadoopdn-04

部署参考 https://www.cnblogs.com/ilifeilong/p/10610993.html

1. 新datanode节点按照全新安装方式配置ssh无密码登录、系统变量、主机名解析、等

2.在namenode active节点sht-sgmhadoopnn-01修改配置文件

1）slaves

添加主机名sht-sgmhadoopdn-04至slaves文件

2）hdfs-site.xml

将dfs.replication参数值修改为4

3. 在namenode active节点sht-sgmhadoopnn-01将以上两个新修改的文件rsync到集群其他节点

# rsync -az --progress hdfs-site.xml root@172.16.101.56:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress hdfs-site.xml root@172.16.101.58:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress hdfs-site.xml root@172.16.101.59:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress hdfs-site.xml root@172.16.101.60:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress hdfs-site.xml root@172.16.101.66:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress slaves root@172.16.101.56:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress slaves root@172.16.101.58:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress slaves root@172.16.101.59:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress slaves root@172.16.101.60:/usr/local/hadoop/etc/hadoop/

# rsync -az --progress slaves root@172.16.101.66:/usr/local/hadoop/etc/hadoop/

4. 在namenode active节点sht-sgmhadoopnn-01将hadoop目录同步到新节点

# rsync -az --progress --exclude=data --exclude=logs  /usr/local/hadoop-2.7. root@sht-sgmhadoopdn-:/usr/local/

5. 在新节点上启动datanode和nodemanager角色

# hadoop-daemon.sh start datanode

# yarn-daemon.sh start nodemanager

6. 在namenode和resourcemanager 的active节点或standby节点的WEB界面验证

http://172.16.101.55:50070/dfshealth.html#tab-datanode

Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

http://172.16.101.55:8088/cluster/nodes

Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

7.重新均衡集群datanode数据（建议在standby namenode节点操作）

# hdfs balancer -threshold

输出log

# hdfs balancer -threshold 1

19/03/29 23:59:21 INFO balancer.Balancer: Using a threshold of 1.0

19/03/29 23:59:21 INFO balancer.Balancer: namenodes  = [hdfs://mycluster]

19/03/29 23:59:21 INFO balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 1.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0, run during upgrade = false]

Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved

19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)

19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)

19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)

19/03/29 23:59:24 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)

19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)

19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)

19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)

19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010

19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010

19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010

19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010

19/03/29 23:59:24 INFO balancer.Balancer: 0 over-utilized: []

19/03/29 23:59:24 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]

19/03/29 23:59:24 INFO balancer.Balancer: Need to move 1.10 GB to make the cluster balanced.

19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized

19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized

19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized

19/03/29 23:59:24 INFO balancer.Balancer: Decided to move 635.63 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK

19/03/29 23:59:24 INFO balancer.Balancer: Decided to move 147.43 MB bytes from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK

19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized

19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized

19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized

19/03/29 23:59:24 INFO balancer.Balancer: Will move 783.06 MB in this iteration

19/03/29 23:59:24 INFO balancer.Dispatcher: Limiting threads per target to the specified max.

19/03/29 23:59:24 INFO balancer.Dispatcher: Allocating 5 threads per target.

19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741846_1022 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741838_1014 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741845_1021 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/29 23:59:52 INFO balancer.Dispatcher: Successfully moved blk_1073741838_1014 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/29 23:59:52 INFO balancer.Dispatcher: Start moving blk_1073741836_1012 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:00:14 INFO balancer.Dispatcher: Successfully moved blk_1073741836_1012 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:00:14 INFO balancer.Dispatcher: Start moving blk_1073741835_1011 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:00:38 INFO balancer.Dispatcher: Successfully moved blk_1073741835_1011 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:01:44 WARN balancer.Dispatcher: Failed to move blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741837_1013 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22240 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:01:44 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:01:44 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:02:07 WARN balancer.Dispatcher: Failed to move blk_1073741845_1021 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741845_1021 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22238 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:02:07 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:02:07 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:02:11 WARN balancer.Dispatcher: Failed to move blk_1073741839_1015 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22232 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:02:11 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:02:11 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:02:35 WARN balancer.Dispatcher: Failed to move blk_1073741846_1022 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741846_1022 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22234 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:02:35 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:02:35 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

Mar 30, 2019 12:02:36 AM          0               384 MB             1.10 GB          783.06 MB

19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)

19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)

19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)

19/03/30 00:02:41 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)

19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)

19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)

19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)

19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010

19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010

19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010

19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010

19/03/30 00:02:41 INFO balancer.Balancer: 0 over-utilized: []

19/03/30 00:02:41 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]

19/03/30 00:02:41 INFO balancer.Balancer: Need to move 833.58 MB to make the cluster balanced.

19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized

19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized

19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized

19/03/30 00:02:41 INFO balancer.Balancer: Decided to move 538.88 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:02:41 INFO balancer.Balancer: Decided to move 244.18 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized

19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized

19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized

19/03/30 00:02:41 INFO balancer.Balancer: Will move 783.06 MB in this iteration

19/03/30 00:02:41 INFO balancer.Dispatcher: Limiting threads per target to the specified max.

19/03/30 00:02:41 INFO balancer.Dispatcher: Allocating 5 threads per target.

19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741842_1018 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741841_1017 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741840_1016 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.io.IOException: Got error, status message Not able to copy block 1073741834 to /172.16.101.66:22256 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 from /172.16.101.58:50010, block move is failed

19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741841_1017 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741841_1017 received exception java.io.IOException: Got error, status message Not able to copy block 1073741841 to /172.16.101.66:22258 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741841_1017 from /172.16.101.58:50010, block move is failed

19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741840_1016 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741840_1016 received exception java.io.IOException: Got error, status message Not able to copy block 1073741840 to /172.16.101.66:22260 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741840_1016 from /172.16.101.58:50010, block move is failed

19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 received exception java.io.IOException: Got error, status message Not able to copy block 1073741839 to /172.16.101.66:22262 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 from /172.16.101.58:50010, block move is failed

19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:02:58 INFO balancer.Dispatcher: Successfully moved blk_1073741842_1018 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:02:58 INFO balancer.Dispatcher: Successfully moved blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

Mar 30, 2019 12:02:58 AM          1               640 MB           833.58 MB          783.06 MB

19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)

19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)

19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)

19/03/30 00:03:03 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)

19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)

19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)

19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)

19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010

19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010

19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010

19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010

19/03/30 00:03:03 INFO balancer.Balancer: 0 over-utilized: []

19/03/30 00:03:03 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]

19/03/30 00:03:03 INFO balancer.Balancer: Need to move 640.08 MB to make the cluster balanced.

19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized

19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized

19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized

19/03/30 00:03:03 INFO balancer.Balancer: Decided to move 474.38 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:03:03 INFO balancer.Balancer: Decided to move 308.67 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized

19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized

19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized

19/03/30 00:03:03 INFO balancer.Balancer: Will move 783.06 MB in this iteration

19/03/30 00:03:03 INFO balancer.Dispatcher: Limiting threads per target to the specified max.

19/03/30 00:03:03 INFO balancer.Dispatcher: Allocating 5 threads per target.

19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741833_1009 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:03:03 WARN balancer.Dispatcher: Failed to move blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741828_1004 received exception java.io.IOException: Got error, status message Not able to copy block 1073741828 to /172.16.101.66:22272 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741828_1004 from /172.16.101.58:50010, block move is failed

19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:03:03 WARN balancer.Dispatcher: Failed to move blk_1073741826_1002 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 received exception java.io.IOException: Got error, status message Not able to copy block 1073741826 to /172.16.101.66:22274 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 from /172.16.101.58:50010, block move is failed

19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:03:47 INFO balancer.Dispatcher: Successfully moved blk_1073741833_1009 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:05:12 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22266 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:05:12 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:05:12 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:05:36 WARN balancer.Dispatcher: Failed to move blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741827_1003 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22270 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:05:36 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:05:36 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:06:11 WARN balancer.Dispatcher: Failed to move blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741832_1008 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22268 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:06:11 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:06:11 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

Mar 30, 2019 12:06:11 AM          2               768 MB           640.08 MB          783.06 MB

19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)

19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)

19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)

19/03/30 00:06:16 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)

19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)

19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)

19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)

19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010

19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010

19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010

19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010

19/03/30 00:06:16 INFO balancer.Balancer: 0 over-utilized: []

19/03/30 00:06:16 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]

19/03/30 00:06:16 INFO balancer.Balancer: Need to move 458.28 MB to make the cluster balanced.

19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized

19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized

19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized

19/03/30 00:06:16 INFO balancer.Balancer: Decided to move 413.78 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:06:16 INFO balancer.Balancer: Decided to move 369.28 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized

19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized

19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized

19/03/30 00:06:16 INFO balancer.Balancer: Will move 783.06 MB in this iteration

19/03/30 00:06:16 INFO balancer.Dispatcher: Limiting threads per target to the specified max.

19/03/30 00:06:16 INFO balancer.Dispatcher: Allocating 5 threads per target.

19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:06:16 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.io.IOException: Got error, status message Not able to copy block 1073741834 to /172.16.101.66:22284 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 from /172.16.101.58:50010, block move is failed

19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741825_1001 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:06:16 WARN balancer.Dispatcher: Failed to move blk_1073741825_1001 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741825_1001 received exception java.io.IOException: Got error, status message Not able to copy block 1073741825 to /172.16.101.66:22286 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741825_1001 from /172.16.101.58:50010, block move is failed

19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

19/03/30 00:06:19 INFO balancer.Dispatcher: Successfully moved blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:06:49 INFO balancer.Dispatcher: Successfully moved blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:06:53 INFO balancer.Dispatcher: Successfully moved blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:08:36 WARN balancer.Dispatcher: Failed to move blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22280 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:08:36 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:08:36 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

Mar 30, 2019 12:08:36 AM          3              1.02 GB           458.28 MB          783.06 MB

19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)

19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)

19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)

19/03/30 00:08:41 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)

19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)

19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)

19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)

19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010

19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010

19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010

19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010

19/03/30 00:08:41 INFO balancer.Balancer: 0 over-utilized: []

19/03/30 00:08:41 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]

19/03/30 00:08:41 INFO balancer.Balancer: Need to move 248.99 MB to make the cluster balanced.

19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized

19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized

19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized

19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 344.02 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 344.02 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 95.03 MB bytes from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK

19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized

19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized

19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized

19/03/30 00:08:41 INFO balancer.Balancer: Will move 783.06 MB in this iteration

19/03/30 00:08:41 INFO balancer.Dispatcher: Limiting threads per target to the specified max.

19/03/30 00:08:41 INFO balancer.Dispatcher: Allocating 5 threads per target.

19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741825_1001 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741848_1024 with size=73209856 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:09:35 INFO balancer.Dispatcher: Successfully moved blk_1073741848_1024 with size=73209856 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:09:35 INFO balancer.Dispatcher: Start moving blk_1073741847_1023 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:09:40 INFO balancer.Dispatcher: Successfully moved blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:09:41 INFO balancer.Dispatcher: Successfully moved blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:09:57 INFO balancer.Dispatcher: Successfully moved blk_1073741825_1001 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:09:57 INFO balancer.Dispatcher: Successfully moved blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010

19/03/30 00:12:28 WARN balancer.Dispatcher: Failed to move blk_1073741847_1023 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741847_1023 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22298 remote=/172.16.101.58:50010], block move is failed

19/03/30 00:12:28 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds

19/03/30 00:12:28 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds

Mar 30, 2019 12:12:28 AM          4              1.59 GB           248.99 MB          783.06 MB

19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)

19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)

19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)

19/03/30 00:12:33 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)

19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)

19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)

19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)

19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010

19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010

19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010

19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010

19/03/30 00:12:33 INFO balancer.Balancer: 0 over-utilized: []

19/03/30 00:12:33 INFO balancer.Balancer: 0 underutilized: []

The cluster is balanced. Exiting...

Mar 30, 2019 12:12:33 AM          5              1.59 GB                 0 B               -1 B

Mar 30, 2019 12:12:34 AM Balancing took 13.216533333333333 minutes

再次查看hdfs集群负载

Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

8. 修改hdfs集群中现有文件/目录的副本因子

现有的文件的备份系数仍是原来的值，hadoop并不会自动的按照新的备份系数调整，我们需要手动完成。

hdfs dfs -setrep -R -w  /

输出log

Replication 4 set: /CentOS-6.8-x86_64-bin-DVD2.iso

Replication 4 set: /hadoop-2.8.1.tar.gz

Replication 4 set: /slaves

Waiting for /CentOS-6.8-x86_64-bin-DVD2.iso ..................... done

Waiting for /hadoop-2.8.1.tar.gz ... done

Waiting for /slaves ... done

通过命令查看

# hdfs fsck /

Connecting to namenode via http://sht-sgmhadoopnn-01:50070/fsck?ugi=root&path=%2F

FSCK started by root (auth:SIMPLE) from /172.16.101.55 for path / at Sat Mar  :: CST

...Status: HEALTHY

 Total size:     B

 Total dirs:

 Total files:

 Total symlinks:

 Total blocks (validated):     (avg. block size  B)

 Minimally replicated blocks:     (100.0 %)

 Over-replicated blocks:     (0.0 %)

 Under-replicated blocks:     (0.0 %)

 Mis-replicated blocks:         (0.0 %)

 Default replication factor:

 Average block replication:    4.0

 Corrupt blocks:

 Missing replicas:         (0.0 %)

 Number of data-nodes:

 Number of racks:

FSCK ended at Sat Mar  :: CST  in  milliseconds

The filesystem under path '/' is HEALTHY

以上步骤在不重启hdfs集群下动态添加datanode节点，仍然建议在适当时重启hdfs集群。