一、准备工作:
- 安装jdk >= 1.7: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
- java -version
- 下载hadoop2.6:http://hadoop.apache.org/releases.html
- 设置ssh信任关系:ssh-keygen -t rsa ---> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- 登陆测试:ssh localhost
二、配置环境变量:
- JAVA:
- vim ~/.bash_profile:
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home" (jdk安装路径)
export PATH=${JAVA_HOME}/bin:$PATH (有则不用加)
- vim ~/.bash_profile:
- HADOOP:
- vim ~/.bash_profile:
-
export HADOOP_HOME=/XXX/hadoop-2.6.4 (hadoop的解压缩路径)export YARN_HOME=/XXX/Code/hadoop-2.6.4export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_CONF_DIR=$YARN_HOME/etc/hadoop
- 使之生效:source ~/.bash_profile
-
- vim ~/.bash_profile:
三、编辑hadoop配置文件:(cd $HASOOP_HOME/etc/hadoop)
- hadoop-env.sh
-
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home
export YARN_HOME=/XXX/Code/hadoop-2.6.4
export PATH=$PATH:/XXX/hadoop-2.6.4/bin
- 使之生效:source hadoop-env.sh
- 到此为止即可进行单机测试:
- cd $HASOOP_HOME
- mkdir in
- cp file1 in (拷点东西进去)
- hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount in out (out 是自动创建的,不能提前创建)
- 继续进行为分布式配置
- core-site.xml
-
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
sceme and authority determine the FileSystem implementation.
</description>
</property>
- yarn-site.xml
-
<property>
<name>yarn.noCHdemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
- mapred-site.xml (创建一个,或者把mapred-site.xml.template 复制一份为mapred-site.xml)
-
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
- hdfs-site.xml
- 指定主机上作为namenode和datanode的目录:
- $HADOOP_PATH/hdfs/name
- $HADOOP_PATH/hdfs/data
-
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/$HADOOP_PATH/hdfs/name</value> (HADOOP_PATH替换为hadoop的解压缩路径)
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/$HADOOP_PATH/hdfs/data</value>
</property>
- 指定主机上作为namenode和datanode的目录:
四、启动:
- 格式化hdfs:hdfs namenode -format
- 启动hadoop
- cd $HADOOP_PATH/sbin
- ./start-dfs.sh
- ./start-yarn.sh
- http://localhost:50070/ ---hdfs管理页面 http://localhost:8088/ ---hadoop进程管理页面
- jsp 会看到有五个进程在运行
- 测试
- hadooop fs -mkdir /user/zhangsan/in
- hadooop fs -copyFromLocal ... /user/zhangsan/in (拷贝一些东西到hdfs中)
- hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /user/zhansan/inp /user/zhangsan/out
- hadoop fs -cat /user/zhangsan/out/* (看到词频统计结果)
本地测试:
cat in/* | ./map | sort | reduce