目的：

初步感受一下hadoop mapreduce

环境：

hadoop 2.6.4

1 准备输入文件

paper.txt 内容一般为英文文章,随便弄点什么进去

hadoop@ssmaster:~$ hadoop fs -mkdir /input

hadoop@ssmaster:~$ ls

Desktop  Documents  Downloads  examples.desktop  hadoop-2.6..tar.gz  Music  paper.txt  Pictures  Public  Templates  Videos

hadoop@ssmaster:~$ hadoop fs -put paper.txt  /input

hadoop@ssmaster:~$ hadoop fs -ls /input

Found  items

-rw-r--r--    hadoop supergroup        -- : /input/paper.txt

注意：输出目录/output 不用提前创建,程序会自动做这一步

2 执行

hadoop@ssmaster:~$ hadoop jar /opt/hadoop-2.6./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6..jar  wordcount /input /output
// :: INFO client.RMProxy: Connecting to ResourceManager at ssmaster/192.168.249.144:

// :: INFO input.FileInputFormat: Total input paths to process :

// :: INFO mapreduce.JobSubmitter: number of splits:

// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477208120905_0001

// :: INFO impl.YarnClientImpl: Submitted application application_1477208120905_0001

// :: INFO mapreduce.Job: The url to track the job: http://ssmaster:8088/proxy/application_1477208120905_0001/

// :: INFO mapreduce.Job: Running job: job_1477208120905_0001

// :: INFO mapreduce.Job: Job job_1477208120905_0001 running in uber mode : false

// :: INFO mapreduce.Job:  map % reduce %

6/10/23 00:51:38 INFO mapreduce.Job: map 0% reduce 0%
16/10/23 00:52:17 INFO mapreduce.Job: map 100% reduce 0%
16/10/23 00:52:39 INFO mapreduce.Job: map 100% reduce 100%
16/10/23 00:52:41 INFO mapreduce.Job: Job job_1477208120905_0001 completed successfully
16/10/23 00:52:41 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2061
FILE: Number of bytes written=217797
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1863
HDFS: Number of bytes written=1425
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=35792
Total time spent by all reduces in occupied slots (ms)=18540
Total time spent by all map tasks (ms)=35792
Total time spent by all reduce tasks (ms)=18540
Total vcore-milliseconds taken by all map tasks=35792
Total vcore-milliseconds taken by all reduce tasks=18540
Total megabyte-milliseconds taken by all map tasks=36651008
Total megabyte-milliseconds taken by all reduce tasks=18984960
Map-Reduce Framework
Map input records=11
Map output records=303
Map output bytes=2969
Map output materialized bytes=2061
Input split bytes=101
Combine input records=303
Combine output records=158
Reduce input groups=158
Reduce shuffle bytes=2061
Reduce input records=158
Reduce output records=158
Spilled Records=316
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1093
CPU time spent (ms)=5550
Physical memory (bytes) snapshot=442781696
Virtual memory (bytes) snapshot=1448112128
Total committed heap usage (bytes)=276299776
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1762
File Output Format Counters
Bytes Written=1425

可以从Web监控页面查看执行状态

http://ssmaster:8088/cluster

Cluster Metrics

Apps Submitted	Apps Pending	Apps Running	Apps Completed	Containers Running	Memory Used	Memory Total	Memory Reserved	VCores Used	VCores Total	VCores Reserved	Active Nodes	Decommissioned Nodes	Lost Nodes	Unhealthy Nodes	Rebooted Nodes
1	0	1	0	2	3 GB	8 GB	0 B	2	8	0	1	0	0	0	0

Show
20
40
60
80
100

entries

Search:

ID	User	Name	Application Type	Queue	StartTime	FinishTime	State	FinalStatus	Progress	Tracking UI	Blacklisted Nodes
application_1477208120905_0001	hadoop	word count	MAPREDUCE	default	Sun, 23 Oct 2016 07:51:13 GMT	N/A	RUNNING	UNDEFINED		ApplicationMaster	0

3 查看输出结果

hadoop@ssmaster:~$ hadoop fs -ls /output

Found  items

-rw-r--r--    hadoop supergroup           -- : /output/_SUCCESS

-rw-r--r--    hadoop supergroup        -- : /output/part-r-

hadoop@ssmaster:~$ hadoop fs -cat  /output/part-r-

Always

Dream

There

a

all

along

always

...........

...........

Q 总结

非常简单,没什么感觉。

后续：

自己编写mapreduce wordcount 程序
搭建一个纯分布式,同样的程序处理一个大文件，观察一下速度

秒客网

[b0004] Hadoop 版hello word mapreduce wordcount 运行

目的：

环境：

1 准备输入文件

2 执行

Cluster Metrics

3 查看输出结果

Q 总结

相关文章