Linux下Hadoop2.7.3集群环境的搭建

时间:2023-03-08 15:57:26

Linux下Hadoop2.7.3集群环境的搭建

本文旨在提供最基本的,可以用于在生产环境进行Hadoop、HDFS分布式环境的搭建,对自己是个总结和整理,也能方便新人学习使用。

基础环境

JDK的安装与配置

现在直接到Oracle官网(http://www.oracle.com/)寻找JDK7的安装包不太容易,因为现在官方推荐JDK8。找了半天才找到JDK下载列表页的地址(http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html)。因为选择Linux操作系统作为部署环境,所以选择64位的版本。我选择的是jdk-7u79-linux-x64.gz。

这里直接使用rpm包直接安装

rpm –ivh jdk-7u131-linux-x64.rpm

回到/home/hadoop目录,配置java环境变量,命令如下:

在.bash_profile中加入以下内容:

Linux下Hadoop2.7.3集群环境的搭建

立刻让java环境变量生效,执行如下命令:

source .bash_profile

最后验证java是否安装配置正确:

Linux下Hadoop2.7.3集群环境的搭建

Host

由于我搭建Hadoop集群包含三台机器,所以需要修改调整各台机器的hosts文件配置,命令如下:

vi /ets/hosts

如果没有足够的权限,可以切换用户为root。

三台机器的内容统一增加以下host配置:

Linux下Hadoop2.7.3集群环境的搭建

SSH信任

由于NameNode与DataNode之间通信,使用了SSH,所以需要配置免登录,使slave可以ssh免密登陆master。

具体配置可参考:

http://www.cnblogs.com/chenjunjie/p/4000228.html

文件目录

为了便于管理,给Master的hdfs的NameNode、DataNode及临时文件,在用户目录下创建目录:

/home/hadoop/hdfs/name

/home/hadoop/hdfs/data

/home/hadoop/hdfs/tmp

然后将这些目录通过scp命令拷贝到Slave1和Slave2的相同目录下。

Hadoop的安装与配置

下载

首先到Apache官网(http://www.apache.org/dyn/closer.cgi/hadoop/common/)下载Hadoop,从中选择推荐的下载镜像(http://mirrors.hust.edu.cn/apache/hadoop/common/),我选择hadoop-2.7.3的版本

使用以下命令hadoop-2.7.2.tar.gz解压缩到/home/hadoop目录

tar -zxvf hadoop-2.7.3.tar.gz

环境变量

回到/home/hadoop目录,配置hadoop环境变量,命令如下:

vi .bash_profile

在.bash_profile中加入以下内容:

export HADOOP_DEV_HOME=/home/hadoop/hadoop-2.7.3

export PATH=$PATH:$HADOOP_DEV_HOME/bin

export PATH=$PATH:$HADOOP_DEV_HOME/sbin

export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}

export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}

export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}

export YARN_HOME=${HADOOP_DEV_HOME}

export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop

export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop

export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop

export JAVA_LIBRARY_PATH='/home/hadoop/hadoop-2.7.3/lib/native'

export HBASE_HOME=/home/hadoop/hbase-1.2.4

export PATH=$PATH:$HBASE_HOME/bin

立刻让hadoop环境变量生效,执行如下命令:

source .bash_profile

Hadoop的配置

进入hadoop-2.7.3的配置目录:

cd home/hadoop/hadoop-2.7.3/etc/hadoop

依次修改core-site.xml、hdfs-site.xml、mapred-site.xml及yarn-site.xml文件。

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>hadoop.tmp.dir</name>

<value>file:/home/hadoop/hdfs/tmp</value>

<description>A base for other temporary directories.</description>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131072</value>

</property>

<property>

<name>fs.default.name</name>

<value>hdfs://Master:9000</value>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hadoop/hdfs/name</value>

<final>true</final>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/hadoop/hdfs/data</value>

<final>true</final>

</property>

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>Master:9001</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>

mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

yarn-site.xml

<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.resourcemanager.address</name>

<value>Master:18040</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>Master:18030</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>Master:18088</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>Master:18025</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>Master:18141</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

在hadoop-env.sh中加入如下配置:

export JAVA_HOME=/usr/java/jdk1.7.0_131

在masters中加入

master

在slave中加入

slave1

slave2

样例如图:

Linux下Hadoop2.7.3集群环境的搭建

最后,将整个hadoop-2.7.3文件夹及其子文件夹使用scp复制到两台Slave的相同目录中:

scp -r hadoop-2.7.3 hadoop@Slave1:/home/hadoop/

scp -r hadoop-2.7.3 hadoop@Slave2:/home/hadoop/

运行Hadoop

运行HDFS

格式化NameNode

执行命令:

hadoop namenode -format

执行过程如下图:

Linux下Hadoop2.7.3集群环境的搭建

最后的执行结果如下图:

Linux下Hadoop2.7.3集群环境的搭建

启动NameNode

hadoop-daemon.sh start namenode

执行结果如下图:

Linux下Hadoop2.7.3集群环境的搭建

最后在Master上执行ps -ef | grep hadoop,得到如下结果:

Linux下Hadoop2.7.3集群环境的搭建

在Master上执行jps命令,得到如下结果:

Linux下Hadoop2.7.3集群环境的搭建

说明NameNode启动成功。

启动DataNode

执行命令如下:

hadoop-daemons.sh start datanode

执行结果如下:

Linux下Hadoop2.7.3集群环境的搭建

在Slave1上执行命令,如下图:

Linux下Hadoop2.7.3集群环境的搭建

在Slave2上执行命令,如下图:

Linux下Hadoop2.7.3集群环境的搭建

说明Slave1和Slave2上的DataNode运行正常。

以上启动NameNode和DataNode的方式,可以用start-dfs.sh脚本替代:

Linux下Hadoop2.7.3集群环境的搭建

运行YARN

运行Yarn也有与运行HDFS类似的方式。启动ResourceManager使用以下命令:

yarn-daemon.sh start resourcemanager

批量启动多个NodeManager使用以下命令:

yarn-daemons.sh start nodemanager

以上方式我们就不赘述了,来看看使用start-yarn.sh的简洁的启动方式:
Linux下Hadoop2.7.3集群环境的搭建

在Master上执行jps:

Linux下Hadoop2.7.3集群环境的搭建

说明ResourceManager运行正常。

在两台Slave上执行jps,也会看到NodeManager运行正常,如下图:

Linux下Hadoop2.7.3集群环境的搭建

Linux下Hadoop2.7.3集群环境的搭建

测试Hadoop

测试HDFS

最后测试下亲手搭建的Hadoop集群是否执行正常,测试的命令如下图所示:

Linux下Hadoop2.7.3集群环境的搭建

测试YARN

可以访问YARN的管理界面,验证YARN,如下图所示:

Linux下Hadoop2.7.3集群环境的搭建

测试mapreduce

本人比较懒,不想编写mapreduce代码。幸好Hadoop安装包里提供了现成的例子,在Hadoop的share/hadoop/mapreduce目录下。运行例子:

Linux下Hadoop2.7.3集群环境的搭建

Linux下Hadoop2.7.3集群环境的搭建

Linux下Hadoop2.7.3集群环境的搭建

配置运行Hadoop中遇见的问题

yarn.nodemanager.aux-services错误

在执行start-yarn.sh脚本启动YARN时,在Slave1和Slave2机器上执行jps命令未发现NodeManager进程,于是登录Slave机器查看日志,发现以下错误信息:

Linux下Hadoop2.7.3集群环境的搭建

参考网上的解决方式,是因为yarn-site.xml文件中yarn.nodemanager.aux-services对应的值mapreduce.shuffle已经被替换为mapreduce_shuffle。有些参考用书上也错误的写为另一个值mapreduce-shuffle。

Linux下Hadoop2.7.3集群环境的搭建