第一步:安装操作系统并创建Hadoop用户
OS:RHEL6.5
[root@hadoop ~]# useradd hadoop
[root@hadoop ~]# passwd hadoop
第二步:Java安装
自带Java
[root@hadoop ~]# java -version
java version "1.7.0_45"
OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
JAVA_HOME为/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64
第三步:SSH登陆权限设置
对于Hadoop的伪分布和全分布而言,Hadoop的NameNode需要启动集群中所有机器的Hadoop守护进程。通过SSH实现。
配置SSH
su - hadoop
mkdir ~/.ssh
chmod 700 ~/.ssh
/usr/bin/ssh-keygen -t rsa
/usr/bin/ssh-keygen -t dsa
检查是否有~/.ssh/authorized_keys 如果没有执行下面,如果有,跳过
$ touch ~/.ssh/authorized_keys
$ cd ~/.ssh
$ ls
----------------------------------
ssh rac1 cat /home/oracle/.ssh/id_rsa.pub >> authorized_keys
ssh rac1 cat /home/oracle/.ssh/id_dsa.pub >> authorized_keys
ssh rac2 cat /home/oracle/.ssh/id_rsa.pub >> authorized_keys
ssh rac2 cat /home/oracle/.ssh/id_dsa.pub >>authorized_keys
scp authorized_keys rac2:/home/oracle/.ssh/
第四步:单机Hadoop安装
下载安装包:hadoop-2.8.1.tar.gz
上传安装包
创建合适的目录,解压安装包。
cd /usr/local
mkdir hadoop
cp /usr/hadoop-2.8.1.tar.gz /usr/local/hadoop/
tar -xzvf hadoop-2.8.1.tar.gz
[hadoop@hadoop hadoop-2.8.1]$ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/jre
[hadoop@hadoop hadoop-2.8.1]$ ./bin/hadoop version
Hadoop 2.8.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 20fe5304904fc2f5a18053c389e43cd26f7a70fe
Compiled by vinodkv on 2017-06-02T06:14Z
Compiled with protoc 2.5.0
From source with checksum 60125541c2b3e266cbf3becc5bda666
This command was run using /usr/local/hadoop/hadoop-2.8.1/share/hadoop/common/hadoop-common-2.8.1.jar
测试:
mkdir input
cp /usr/local/hadoop/hadoop-2.8.1/etc/hadoop /usr/local/hadoop/hadoop-2.8.1/input
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar grep input output 'dfs[a-z.]+'
结果:
。。。
File System Counters
FILE: Number of bytes read=1500730
FILE: Number of bytes written=2509126
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=12
Map output records=12
Map output bytes=274
Map output materialized bytes=304
Input split bytes=133
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=304
Reduce input records=12
Reduce output records=12
Spilled Records=24
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=34
Total committed heap usage (bytes)=274628608
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=468
File Output Format Counters
Bytes Written=214
output下的信息:
[root@hadoop output]# ll
total 4
-rw-r--r--. 1 hadoop hadoop 202 Jul 23 14:57 part-r-00000
-rw-r--r--. 1 hadoop hadoop 0 Jul 23 14:57 _SUCCESS
[root@hadoop output]# vi part-r-00000
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
3 dfs.logger
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.log
1 dfs.file