Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

时间:2023-03-09 17:35:33
Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

当前集群

主机名称 IP地址 角色 统一安装目录 统一安装用户
sht-sgmhadoopnn-01 172.16.101.55 namenode,resourcemanager

/usr/local/hadoop(软连接)

/usr/local/hadoop-2.7.4

/usr/local/zookeeper(软连接)

/usr/local/zookeeper-3.4.9

root

sht-sgmhadoopnn-02 172.16.101.56 namenode,resourcemanager
sht-sgmhadoopdn-01 172.16.101.58 datanode,nodemanager,journalnode,zookeeper
sht-sgmhadoopdn-02 172.16.101.59 datanode,nodemanager,journalnode,zookeeper
sht-sgmhadoopdn-03 172.16.101.60 datanode,nodemanager,journalnode,zookeeper

集群部署完成后增加datanode sht-sgmhadoopdn-04

部署参考 https://www.cnblogs.com/ilifeilong/p/10610993.html

1. 新datanode节点按照全新安装方式配置ssh无密码登录、系统变量、主机名解析、等

2.在namenode active节点sht-sgmhadoopnn-01修改配置文件

1)slaves

添加主机名sht-sgmhadoopdn-04至slaves文件

2)hdfs-site.xml

将dfs.replication参数值修改为4

3. 在namenode active节点sht-sgmhadoopnn-01将以上两个新修改的文件rsync到集群其他节点

# rsync -az --progress hdfs-site.xml root@172.16.101.56:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress hdfs-site.xml root@172.16.101.58:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress hdfs-site.xml root@172.16.101.59:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress hdfs-site.xml root@172.16.101.60:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress hdfs-site.xml root@172.16.101.66:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress slaves root@172.16.101.56:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress slaves root@172.16.101.58:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress slaves root@172.16.101.59:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress slaves root@172.16.101.60:/usr/local/hadoop/etc/hadoop/
# rsync -az --progress slaves root@172.16.101.66:/usr/local/hadoop/etc/hadoop/

4.  在namenode active节点sht-sgmhadoopnn-01将hadoop目录同步到新节点

# rsync -az --progress --exclude=data --exclude=logs  /usr/local/hadoop-2.7. root@sht-sgmhadoopdn-:/usr/local/

5. 在新节点上启动datanode和nodemanager角色

# hadoop-daemon.sh start datanode
# yarn-daemon.sh start nodemanager

6. 在namenode和resourcemanager 的active节点或standby节点的WEB界面验证

http://172.16.101.55:50070/dfshealth.html#tab-datanode

Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

http://172.16.101.55:8088/cluster/nodes

Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

7.重新均衡集群datanode数据(建议在standby namenode节点操作)

# hdfs balancer -threshold 

输出log

# hdfs balancer -threshold 1
19/03/29 23:59:21 INFO balancer.Balancer: Using a threshold of 1.0
19/03/29 23:59:21 INFO balancer.Balancer: namenodes = [hdfs://mycluster]
19/03/29 23:59:21 INFO balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 1.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0, run during upgrade = false]
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
19/03/29 23:59:24 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
19/03/29 23:59:24 INFO balancer.Balancer: 0 over-utilized: []
19/03/29 23:59:24 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]
19/03/29 23:59:24 INFO balancer.Balancer: Need to move 1.10 GB to make the cluster balanced.
19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
19/03/29 23:59:24 INFO balancer.Balancer: Decided to move 635.63 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK
19/03/29 23:59:24 INFO balancer.Balancer: Decided to move 147.43 MB bytes from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK
19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
19/03/29 23:59:24 INFO balancer.Balancer: Will move 783.06 MB in this iteration
19/03/29 23:59:24 INFO balancer.Dispatcher: Limiting threads per target to the specified max.
19/03/29 23:59:24 INFO balancer.Dispatcher: Allocating 5 threads per target.
19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741846_1022 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741838_1014 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741845_1021 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/29 23:59:52 INFO balancer.Dispatcher: Successfully moved blk_1073741838_1014 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/29 23:59:52 INFO balancer.Dispatcher: Start moving blk_1073741836_1012 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:00:14 INFO balancer.Dispatcher: Successfully moved blk_1073741836_1012 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:00:14 INFO balancer.Dispatcher: Start moving blk_1073741835_1011 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:00:38 INFO balancer.Dispatcher: Successfully moved blk_1073741835_1011 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:01:44 WARN balancer.Dispatcher: Failed to move blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741837_1013 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22240 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:01:44 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:01:44 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:02:07 WARN balancer.Dispatcher: Failed to move blk_1073741845_1021 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741845_1021 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22238 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:02:07 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:02:07 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:02:11 WARN balancer.Dispatcher: Failed to move blk_1073741839_1015 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22232 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:02:11 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:02:11 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:02:35 WARN balancer.Dispatcher: Failed to move blk_1073741846_1022 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741846_1022 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22234 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:02:35 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:02:35 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
Mar 30, 2019 12:02:36 AM 0 384 MB 1.10 GB 783.06 MB
19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
19/03/30 00:02:41 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
19/03/30 00:02:41 INFO balancer.Balancer: 0 over-utilized: []
19/03/30 00:02:41 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]
19/03/30 00:02:41 INFO balancer.Balancer: Need to move 833.58 MB to make the cluster balanced.
19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
19/03/30 00:02:41 INFO balancer.Balancer: Decided to move 538.88 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:02:41 INFO balancer.Balancer: Decided to move 244.18 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
19/03/30 00:02:41 INFO balancer.Balancer: Will move 783.06 MB in this iteration
19/03/30 00:02:41 INFO balancer.Dispatcher: Limiting threads per target to the specified max.
19/03/30 00:02:41 INFO balancer.Dispatcher: Allocating 5 threads per target.
19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741842_1018 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741841_1017 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741840_1016 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.io.IOException: Got error, status message Not able to copy block 1073741834 to /172.16.101.66:22256 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 from /172.16.101.58:50010, block move is failed
19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741841_1017 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741841_1017 received exception java.io.IOException: Got error, status message Not able to copy block 1073741841 to /172.16.101.66:22258 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741841_1017 from /172.16.101.58:50010, block move is failed
19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741840_1016 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741840_1016 received exception java.io.IOException: Got error, status message Not able to copy block 1073741840 to /172.16.101.66:22260 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741840_1016 from /172.16.101.58:50010, block move is failed
19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 received exception java.io.IOException: Got error, status message Not able to copy block 1073741839 to /172.16.101.66:22262 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 from /172.16.101.58:50010, block move is failed
19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:02:58 INFO balancer.Dispatcher: Successfully moved blk_1073741842_1018 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:02:58 INFO balancer.Dispatcher: Successfully moved blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
Mar 30, 2019 12:02:58 AM 1 640 MB 833.58 MB 783.06 MB
19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
19/03/30 00:03:03 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
19/03/30 00:03:03 INFO balancer.Balancer: 0 over-utilized: []
19/03/30 00:03:03 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]
19/03/30 00:03:03 INFO balancer.Balancer: Need to move 640.08 MB to make the cluster balanced.
19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
19/03/30 00:03:03 INFO balancer.Balancer: Decided to move 474.38 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:03:03 INFO balancer.Balancer: Decided to move 308.67 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
19/03/30 00:03:03 INFO balancer.Balancer: Will move 783.06 MB in this iteration
19/03/30 00:03:03 INFO balancer.Dispatcher: Limiting threads per target to the specified max.
19/03/30 00:03:03 INFO balancer.Dispatcher: Allocating 5 threads per target.
19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741833_1009 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:03:03 WARN balancer.Dispatcher: Failed to move blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741828_1004 received exception java.io.IOException: Got error, status message Not able to copy block 1073741828 to /172.16.101.66:22272 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741828_1004 from /172.16.101.58:50010, block move is failed
19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:03:03 WARN balancer.Dispatcher: Failed to move blk_1073741826_1002 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 received exception java.io.IOException: Got error, status message Not able to copy block 1073741826 to /172.16.101.66:22274 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 from /172.16.101.58:50010, block move is failed
19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:03:47 INFO balancer.Dispatcher: Successfully moved blk_1073741833_1009 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:05:12 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22266 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:05:12 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:05:12 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:05:36 WARN balancer.Dispatcher: Failed to move blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741827_1003 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22270 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:05:36 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:05:36 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:06:11 WARN balancer.Dispatcher: Failed to move blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741832_1008 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22268 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:06:11 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:06:11 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
Mar 30, 2019 12:06:11 AM 2 768 MB 640.08 MB 783.06 MB
19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
19/03/30 00:06:16 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
19/03/30 00:06:16 INFO balancer.Balancer: 0 over-utilized: []
19/03/30 00:06:16 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]
19/03/30 00:06:16 INFO balancer.Balancer: Need to move 458.28 MB to make the cluster balanced.
19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
19/03/30 00:06:16 INFO balancer.Balancer: Decided to move 413.78 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:06:16 INFO balancer.Balancer: Decided to move 369.28 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
19/03/30 00:06:16 INFO balancer.Balancer: Will move 783.06 MB in this iteration
19/03/30 00:06:16 INFO balancer.Dispatcher: Limiting threads per target to the specified max.
19/03/30 00:06:16 INFO balancer.Dispatcher: Allocating 5 threads per target.
19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:06:16 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.io.IOException: Got error, status message Not able to copy block 1073741834 to /172.16.101.66:22284 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 from /172.16.101.58:50010, block move is failed
19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741825_1001 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:06:16 WARN balancer.Dispatcher: Failed to move blk_1073741825_1001 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741825_1001 received exception java.io.IOException: Got error, status message Not able to copy block 1073741825 to /172.16.101.66:22286 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741825_1001 from /172.16.101.58:50010, block move is failed
19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
19/03/30 00:06:19 INFO balancer.Dispatcher: Successfully moved blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:06:49 INFO balancer.Dispatcher: Successfully moved blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:06:53 INFO balancer.Dispatcher: Successfully moved blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:08:36 WARN balancer.Dispatcher: Failed to move blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22280 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:08:36 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:08:36 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
Mar 30, 2019 12:08:36 AM 3 1.02 GB 458.28 MB 783.06 MB
19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
19/03/30 00:08:41 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
19/03/30 00:08:41 INFO balancer.Balancer: 0 over-utilized: []
19/03/30 00:08:41 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK]
19/03/30 00:08:41 INFO balancer.Balancer: Need to move 248.99 MB to make the cluster balanced.
19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized
19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized
19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized
19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 344.02 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 344.02 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 95.03 MB bytes from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK
19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized
19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized
19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized
19/03/30 00:08:41 INFO balancer.Balancer: Will move 783.06 MB in this iteration
19/03/30 00:08:41 INFO balancer.Dispatcher: Limiting threads per target to the specified max.
19/03/30 00:08:41 INFO balancer.Dispatcher: Allocating 5 threads per target.
19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741825_1001 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741848_1024 with size=73209856 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:09:35 INFO balancer.Dispatcher: Successfully moved blk_1073741848_1024 with size=73209856 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:09:35 INFO balancer.Dispatcher: Start moving blk_1073741847_1023 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:09:40 INFO balancer.Dispatcher: Successfully moved blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:09:41 INFO balancer.Dispatcher: Successfully moved blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:09:57 INFO balancer.Dispatcher: Successfully moved blk_1073741825_1001 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:09:57 INFO balancer.Dispatcher: Successfully moved blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010
19/03/30 00:12:28 WARN balancer.Dispatcher: Failed to move blk_1073741847_1023 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741847_1023 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22298 remote=/172.16.101.58:50010], block move is failed
19/03/30 00:12:28 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds
19/03/30 00:12:28 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds
Mar 30, 2019 12:12:28 AM 4 1.59 GB 248.99 MB 783.06 MB
19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
19/03/30 00:12:33 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010
19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010
19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010
19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010
19/03/30 00:12:33 INFO balancer.Balancer: 0 over-utilized: []
19/03/30 00:12:33 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Mar 30, 2019 12:12:33 AM 5 1.59 GB 0 B -1 B
Mar 30, 2019 12:12:34 AM Balancing took 13.216533333333333 minutes

再次查看hdfs集群负载

Hadoop 2.7.4 HDFS+YRAN HA增加datanode和nodemanager

8. 修改hdfs集群中现有文件/目录的副本因子

现有的文件的备份系数仍是原来的值,hadoop并不会自动的按照新的备份系数调整,我们需要手动完成。

hdfs dfs -setrep -R -w  /

输出log

Replication 4 set: /CentOS-6.8-x86_64-bin-DVD2.iso
Replication 4 set: /hadoop-2.8.1.tar.gz
Replication 4 set: /slaves
Waiting for /CentOS-6.8-x86_64-bin-DVD2.iso ..................... done
Waiting for /hadoop-2.8.1.tar.gz ... done
Waiting for /slaves ... done

通过命令查看

# hdfs fsck /
Connecting to namenode via http://sht-sgmhadoopnn-01:50070/fsck?ugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from /172.16.101.55 for path / at Sat Mar :: CST
...Status: HEALTHY
Total size: B
Total dirs:
Total files:
Total symlinks:
Total blocks (validated): (avg. block size B)
Minimally replicated blocks: (100.0 %)
Over-replicated blocks: (0.0 %)
Under-replicated blocks: (0.0 %)
Mis-replicated blocks: (0.0 %)
Default replication factor:
Average block replication: 4.0
Corrupt blocks:
Missing replicas: (0.0 %)
Number of data-nodes:
Number of racks:
FSCK ended at Sat Mar :: CST in milliseconds The filesystem under path '/' is HEALTHY

以上步骤在不重启hdfs集群下动态添加datanode节点 ,仍然建议在适当时重启hdfs集群。