Hadoop之HelloWorld

Hadoop开始：

1. 下载最新的发行版，解压到你喜欢的路径。

2. 配置，Hadoop的配置文件位于～/hadoop/conf/ 目录下。这里我先只配置了core-site.xml文件。

 <?xml version="1.0"?>

 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 <!-- Put site-specific property overrides in this file. -->

 <configuration>

     <property>

         <name>fs.default.name</name>

         <value>hdfs://localhost:9000</value>

     </property>

     <property>

         <name>hadoop.tmp.dir</name>

         <value>/home/Jack/dfs</value>

     </property>

 </configuration>

上面我指定了hadoop的DFS文件系统的路径。

3. 格式化DFS系统，输入命令: > ./hadoop namenode -format

4. 启动Hadoop，输入命令: > ./start-all.sh

**到这里Hadoop的启动已经正常，可以在端口50070和50030查看集群的状态。

======================================================================

第一个程序：HadoopHelloWorld

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

public class HadoopHelloWorld {

    public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable> {

        private final static IntWritable one=new IntWritable(1);

        private Text word=new Text();

        public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter)

        throws IOException {

            String line= value.toString();

            StringTokenizer tokenizer=new StringTokenizer(line);

            while(tokenizer.hasMoreTokens()) {

                word.set(tokenizer.nextToken());

                output.collect(word, one);

            }

        }

    }

    public static class Reduce extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {

        public void reduce(Text key,Iterator<IntWritable> values,OutputCollector<Text,IntWritable>output, Reporter reporter)

        throws IOException{

            int sum=0;

            while(values.hasNext()) {

                sum+=values.next().get();

            }

            output.collect(key, new IntWritable(sum));

        }

    }

    public static void main(String args[]) throws Exception {

        JobConf conf=new JobConf(HadoopHelloWorld.class);

        conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);

        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);

        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);

        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));

        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);

    }

}

HadoopHelloWorld

需要引入的基础包：

JRE system Library

Hadoop-core.jar

commons-logging.jar

说明一下，别的文档中没有将需要commons-logging.jar 这个包，可以我的没有这个包一直报错。java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

以上工作做好了之后，编译HadoopHelloWorld.java文件就好，将生成的class文件放入文件夹~/source/java2013/HadoopHelloWorld/，然后打成一个jar包。

[Jack@win bin]$ jar -cvf HadoopHelloWorld.jar -C ~/source/java2013/HadoopHelloWorld/ .

上传2个input文件作为程序输入[ file01,file02 ]。

[Jack@win bin]$./ hadoop fs -mkdir input

[Jack@win bin]$ ./hadoop dfs -put ~/source/java2012/FirstJar/input/file* input

运行程序：

[Jack@win bin]$./hadoop jar HadoopHelloWorld.jar HadoopHelloWorld input output

13/06/20 03:16:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/06/20 03:16:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library

13/06/20 03:16:45 WARN snappy.LoadSnappy: Snappy native library not loaded

13/06/20 03:16:45 INFO mapred.FileInputFormat: Total input paths to process : 4

13/06/20 03:16:45 INFO mapred.JobClient: Running job: job_201306200226_0002

13/06/20 03:16:46 INFO mapred.JobClient: map 0% reduce 0%

13/06/20 03:16:59 INFO mapred.JobClient: map 40% reduce 0%

13/06/20 03:17:05 INFO mapred.JobClient: map 80% reduce 0%

13/06/20 03:17:08 INFO mapred.JobClient: map 80% reduce 26%

13/06/20 03:17:11 INFO mapred.JobClient: map 100% reduce 26%

13/06/20 03:17:23 INFO mapred.JobClient: map 100% reduce 100%

13/06/20 03:17:28 INFO mapred.JobClient: Job complete: job_201306200226_0002

13/06/20 03:17:28 INFO mapred.JobClient: Counters: 30

13/06/20 03:17:28 INFO mapred.JobClient: Job Counters

13/06/20 03:17:28 INFO mapred.JobClient: Launched reduce tasks=1

13/06/20 03:17:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32074

13/06/20 03:17:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/06/20 03:17:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/06/20 03:17:28 INFO mapred.JobClient: Launched map tasks=5

13/06/20 03:17:28 INFO mapred.JobClient: Data-local map tasks=3

13/06/20 03:17:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=23534

13/06/20 03:17:28 INFO mapred.JobClient: File Input Format Counters

13/06/20 03:17:28 INFO mapred.JobClient: Bytes Read=54

13/06/20 03:17:28 INFO mapred.JobClient: File Output Format Counters

13/06/20 03:17:28 INFO mapred.JobClient: Bytes Written=41

13/06/20 03:17:28 INFO mapred.JobClient: FileSystemCounters

13/06/20 03:17:28 INFO mapred.JobClient: FILE_BYTES_READ=104

13/06/20 03:17:28 INFO mapred.JobClient: HDFS_BYTES_READ=541

13/06/20 03:17:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=128481

13/06/20 03:17:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41

13/06/20 03:17:28 INFO mapred.JobClient: Map-Reduce Framework

13/06/20 03:17:28 INFO mapred.JobClient: Map output materialized bytes=128

13/06/20 03:17:28 INFO mapred.JobClient: Map input records=2

13/06/20 03:17:28 INFO mapred.JobClient: Reduce shuffle bytes=122

13/06/20 03:17:28 INFO mapred.JobClient: Spilled Records=16

13/06/20 03:17:28 INFO mapred.JobClient: Map output bytes=82

13/06/20 03:17:28 INFO mapred.JobClient: Total committed heap usage (bytes)=912719872

13/06/20 03:17:28 INFO mapred.JobClient: CPU time spent (ms)=5190

13/06/20 03:17:28 INFO mapred.JobClient: Map input bytes=50

13/06/20 03:17:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=487

13/06/20 03:17:28 INFO mapred.JobClient: Combine input records=0

13/06/20 03:17:28 INFO mapred.JobClient: Reduce input records=8

13/06/20 03:17:28 INFO mapred.JobClient: Reduce input groups=5

13/06/20 03:17:28 INFO mapred.JobClient: Combine output records=0

13/06/20 03:17:28 INFO mapred.JobClient: Physical memory (bytes) snapshot=932745216

13/06/20 03:17:28 INFO mapred.JobClient: Reduce output records=5

13/06/20 03:17:28 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2390478848

13/06/20 03:17:28 INFO mapred.JobClient: Map output records=8

Result

秒客网

Hadoop之HelloWorld

相关文章