基于Docker布署伪分布式hadoop环境(二)

时间:2022-09-24 23:29:29

接着上一次的操作,我们来继续配置hadoop

前面的操作请看这儿

5.hadoop配置

(1)core-site.xml配置

打开core-site.xml文件(nano core-site.xml)

<span style="font-size:18px;"><configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.6.0/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
<final>true</final>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration></span>

注意:

hadoop.tmp.dir配置项值即为此前命令中创建的临时目录路径。
fs.default.name配置为hdfs://master:9000,指向的是一个Master节点的主机

(2).hdfs-site.xml配置

<span style="font-size:18px;"><configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.6.0/namenode</value>
<final>true</final>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.6.0/datanode</value>
<final>true</final>
</property>
</configuration></span>

注意

  • 我们后续搭建集群环境时,将配置一个Master节点和两个Slave节点。所以dfs.replication配置为2。
  • dfs.namenode.name.dir和dfs.datanode.data.dir分别配置为之前创建的NameNode和DataNode的目录路径

(3).mapred-site.xml配置

<span style="font-size:18px;"><configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration></span>

(4).修改JAVA_HOME变量

使用命令.nano hadoop-env.sh,修改如下配置:

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-8-oracle


(5).格式化 namenode

这是很重要的一步,执行命令hadoop namenode -format


6.安装SSH

搭建集群环境,自然少不了使用SSH。这可以实现无密码访问,访问集群机器的时候很方便。

root@8ef06706f88d:~# sudo apt-get install ssh


SSH装好了以后,由于我们是Docker容器中运行,所以SSH服务不会自动启动。需要我们在容器启动以后,手动通过/usr/sbin/sshd 手动打开SSH服务。未免有些麻烦,为了方便,我们把这个命令加入到~/.bashrc文件中。通过nano ~/.bashrc编辑.bashrc文件(nano没有安装的自行安装,也可用vi),在文件后追加下面内容:

#autorun
/usr/sbin/sshd


PS:如果出现Missing privilege separation directory: /var/run/sshd这样的错误,请手动创建/var/run/sshd目录

拉下来生成访问密钥

root@8ef06706f88d:/# cd ~/
root@8ef06706f88d:~# ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa
root@8ef06706f88d:~# cd .ssh
root@8ef06706f88d:~/.ssh# cat id_dsa.pub >> authorized_keys


7.Hadoop分布式集群搭建

启动master容器
docker run -ti -h master ubuntu:14.04

启动slave1容器
docker run -ti -h slave1 ubuntu:14.04

启动slave2容器

docker run -ti -h slave2 ubuntu:14.04


配置hosts

(1)通过ifconfig命令获取各节点ip。环境不同获取的ip可能不一样,如:

172.17.0.2      master
172.17.0.3      slave1
172.17.0.4      slave2

(2)使用sudo nano /etc/hosts命令将如下配置写入各节点的hosts文件,注意修改ip地址

10.0.0.5        master
10.0.0.6 slave1
10.0.0.7 slave2

配置slaves

在master节点容器中执行如下命令:

root@master:~# cd $HADOOP_CONFIG_HOME/
root@master:~/soft/apache/hadoop/hadoop-2.6.0/etc/hadoop# nano slaves
将如下slave节点的hostname信息写入该文件:

slave1
slave2

8.启动hadoop

在master节点上执行start-all.sh命令,启动Hadoop。

如果启动时出现The authenticity of host 'master (172.17.0.2)' can't be established.

执行ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no再启动

启动后在终端执行hadoop dfsadmin -report,出现下列信息说明要配置完成

root@master:/# hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 494517731328 (460.56 GB)
Present Capacity: 383011176448 (356.71 GB)
DFS Remaining: 383011127296 (356.71 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 172.17.0.4:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 247258865664 (230.28 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 55753277440 (51.92 GB)
DFS Remaining: 191505563648 (178.35 GB)
DFS Used%: 0.00%
DFS Remaining%: 77.45%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Aug 05 07:03:08 UTC 2016


Name: 172.17.0.5:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 247258865664 (230.28 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 55753277440 (51.92 GB)
DFS Remaining: 191505563648 (178.35 GB)
DFS Used%: 0.00%
DFS Remaining%: 77.45%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Aug 05 07:03:08 UTC 2016
在浏览器输入地址也可以查看:http://master:5007/http:172.17.0.3:50070


文章借鉴了这里的很多东西,大家如果对此文章有什么疑问可以参考它。