mahout0.8 分布式运行20newsgroup(dataguru mahout 第一周作业)

时间:2021-08-24 16:04:46

安装Mahout,并运行20newsgroup的测试样例,抓图说明实验过程

1:下载二进制解压安装

http://mirror.bit.edu.cn/apache/mahout/0.8/mahout-distribution-0.8.tar.gz

tar -zxvf mahout-distribution-0.8.tar.gz

 

2:配置环境变量

在/etc/profile,/usr/grid/.bashrc中添加如下信息

export MAHOUT_HOME=/usr/grid/mahout-distribution-0.8

export MAHOUT_CONF_DIR=/usr/grid/mahout-distribution-0.8/conf

export PATH=$PATH:$MAHOUT_HOME/conf:$MAHOUT_HOME/bin

3.启动hadoop.

[grid@h1 data]$ start-all.sh

[grid@h1 data]$ jps

10163 Jps

4178 SecondaryNameNode

3997 NameNode

4260 JobTracker

4.检查Mahout是否安装完好,看是否列出了一些算法

 [grid@h1 data]$ mahout–help

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.

Warning: $HADOOP_HOME is deprecated.

 

Running on hadoop, using /user/grid/hadoop/bin/hadoop andHADOOP_CONF_DIR=/user/grid/hadoop/conf

MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar

Warning: $HADOOP_HOME is deprecated.

 

14/01/26 09:43:37 WARN driver.MahoutDriver: Unable to addclass: –help

14/01/26 09:43:38 WARN driver.MahoutDriver: No –help.propsfound on classpath, will use command-line arguments only

Unknown program '–help' chosen.

Valid program names are:

 arff.vector: :Generate Vectors from an ARFF file or directory

 baumwelch: :Baum-Welch algorithm for unsupervised HMM training

 canopy: : Canopyclustering

 cat: : Print a fileor resource as the logistic regression models would see it

 cleansvd: : Cleanupand verification of SVD output

 clusterdump: : Dumpcluster output to text

 clusterpp: : GroupsClustering Output In Clusters

 cmdump: : Dumpconfusion matrix in HTML or text formats

 concatmatrices: :Concatenates 2 matrices of same cardinality into a single matrix

 cvb: : LDA viaCollapsed Variation Bayes (0th deriv. approx)

 cvb0_local: : LDA viaCollapsed Variation Bayes, in memory locally.

 dirichlet: :Dirichlet Clustering

 eigencuts: :Eigencuts spectral clustering

 evaluateFactorization:: compute RMSE and MAE of a rating matrix factorization against probes

 fkmeans: : FuzzyK-means clustering

 fpg: : FrequentPattern Growth

 hmmpredict: :Generate random sequence of observations by given HMM

 itemsimilarity: :Compute the item-item-similarities for item-based collaborative filtering

 kmeans: : K-meansclustering

 lucene.vector: :Generate Vectors from a Lucene index

 lucene2seq: :Generate Text SequenceFiles from a Lucene index

 matrixdump: : Dumpmatrix in CSV format

 matrixmult: : Takethe product of two matrices

 meanshift: : MeanShift clustering

 minhash: : RunMinhash clustering

 parallelALS: : ALS-WRfactorization of a rating matrix

 qualcluster: : Runsclustering experiments and summarizes results in a CSV

 recommendfactorized:: Compute recommendations using the factorization of a rating matrix

 recommenditembased: :Compute recommendations using item-based collaborative filtering

 regexconverter: :Convert text files on a per line basis based on regular expressions

 resplit: : Splits aset of SequenceFiles into a number of equal splits

 rowid: : MapSequenceFile<Text,VectorWritable> to{SequenceFile<IntWritable,VectorWritable>,SequenceFile<IntWritable,Text>}

 rowsimilarity: :Compute the pairwise similarities of the rows of a matrix

 runAdaptiveLogistic:: Score new production data using a probably trained and validatedAdaptivelogisticRegression model

 runlogistic: : Run alogistic regression model against CSV data

 seq2encoded: :Encoded Sparse Vector generation from Text sequence files

 seq2sparse: : SparseVector generation from Text sequence files

 seqdirectory: :Generate sequence files (of Text) from a directory

 seqdumper: : GenericSequence File dumper

 seqmailarchives: :Creates SequenceFile from a directory containing gzipped mail archives

 seqwiki: : Wikipediaxml dump to sequence file

 spectralkmeans: :Spectral k-means clustering

 split: : Split Inputdata into test and train sets

 splitDataset: : splita rating dataset into training and probe parts

 ssvd: : StochasticSVD

 streamingkmeans: :Streaming k-means clustering

 svd: : LanczosSingular Value Decomposition

 testnb: : Test theVector-based Bayes classifier

 trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model

 trainlogistic: :Train a logistic regression using stochastic gradient descent

 trainnb: : Train theVector-based Bayes classifier

 transpose: : Take thetranspose of a matrix

 validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression modelagainst hold-out data set

 vecdist: : Computethe distances between a set of Vectors (or Cluster or Canopy, they must fit inmemory) and a list of Vectors

 vectordump: : Dumpvectors from a sequence file to text

 viterbi: : Viterbidecoding of hidden states from given output states sequence

[grid@h1 data]$

[grid@h1 data]$

[grid@h1 data]$

[grid@h1 data]$

 

 

下载测试数据

wgethttp://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

5.将测试数据拷贝到HDFS

hadoop fs -mkdir ./testdata

hadoop fs -put ./synthetic_control.data ./testdata

hadoop fs -ls ./testdata

 

 

6.做一个kmeans聚类测试

mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

[grid@h1 ~]$ mahoutorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.

Warning: $HADOOP_HOME is deprecated.

 

Running on hadoop, using /usr/grid/hadoop/bin/hadoop andHADOOP_CONF_DIR=/usr/grid/hadoop/conf

MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar

Warning: $HADOOP_HOME is deprecated.

 

14/01/26 00:43:36 WARN driver.MahoutDriver: Noorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found onclasspath, will use command-line arguments only

14/01/26 00:43:36 INFO kmeans.Job: Running with defaultarguments

14/01/26 00:43:37 INFO common.HadoopUtil: Deleting output

14/01/26 00:43:37 INFO kmeans.Job: Preparing Input

14/01/26 00:43:37 WARN mapred.JobClient: UseGenericOptionsParser for parsing the arguments. Applications should implementTool for the same.

14/01/26 00:43:45 INFO input.FileInputFormat: Total inputpaths to process : 1

14/01/26 00:43:45 INFO util.NativeCodeLoader: Loaded thenative-hadoop library

14/01/26 00:43:45 WARN snappy.LoadSnappy: Snappy nativelibrary not loaded

14/01/26 00:43:45 INFO mapred.JobClient: Running job:job_201401252331_0013

14/01/26 00:43:46 INFO mapred.JobClient: map 0% reduce 0%

14/01/26 00:44:12 INFO mapred.JobClient: map 100% reduce 0%

14/01/26 00:44:17 INFO mapred.JobClient: Job complete:job_201401252331_0013

14/01/26 00:44:17 INFO mapred.JobClient: Counters: 19

14/01/26 00:44:17 INFO mapred.JobClient:  Job Counters

14/01/26 00:44:17 INFO mapred.JobClient:    SLOTS_MILLIS_MAPS=15321

14/01/26 00:44:17 INFO mapred.JobClient:    Total time spent by all reduces waitingafter reserving slots (ms)=0

14/01/26 00:44:17 INFO mapred.JobClient:    Total time spent by all maps waiting afterreserving slots (ms)=0

14/01/26 00:44:17 INFO mapred.JobClient:    Launched map tasks=1

14/01/26 00:44:17 INFO mapred.JobClient:    Data-local map tasks=1

14/01/26 00:44:17 INFO mapred.JobClient:    SLOTS_MILLIS_REDUCES=0

14/01/26 00:44:17 INFO mapred.JobClient:  File Output Format Counters

14/01/26 00:44:17 INFO mapred.JobClient:    Bytes Written=335470

14/01/26 00:44:17 INFO mapred.JobClient:  FileSystemCounters

14/01/26 00:44:17 INFO mapred.JobClient:    HDFS_BYTES_READ=288495

14/01/26 00:44:17 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=21400

14/01/26 00:44:17 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=335470

14/01/26 00:44:17 INFO mapred.JobClient:  File Input Format Counters

14/01/26 00:44:17 INFO mapred.JobClient:    Bytes Read=288374

14/01/26 00:44:17 INFO mapred.JobClient:  Map-Reduce Framework

14/01/26 00:44:17 INFO mapred.JobClient:    Map input records=600

14/01/26 00:44:17 INFO mapred.JobClient:    Physical memory (bytes) snapshot=33697792

14/01/26 00:44:17 INFO mapred.JobClient:    Spilled Records=0

14/01/26 00:44:17 INFO mapred.JobClient:    CPU time spent (ms)=330

14/01/26 00:44:17 INFO mapred.JobClient:    Total committed heap usage (bytes)=7929856

14/01/26 00:44:17 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=376700928

14/01/26 00:44:17 INFO mapred.JobClient:    Map output records=600

14/01/26 00:44:17 INFO mapred.JobClient:    SPLIT_RAW_BYTES=121

14/01/26 00:44:17 INFO kmeans.Job: Running random seed to getinitial clusters

14/01/26 00:44:17 INFO zlib.ZlibFactory: Successfully loaded& initialized native-zlib library

14/01/26 00:44:17 INFO compress.CodecPool: Got brand-newcompressor

14/01/26 00:44:18 INFO kmeans.RandomSeedGenerator: Wrote 6Klusters to output/random-seeds/part-randomSeed

14/01/26 00:44:18 INFO kmeans.Job: Running KMeans with k = 6

14/01/26 00:44:18 INFO kmeans.KMeansDriver: Input:output/data Clusters In: output/random-seeds/part-randomSeed Out: outputDistance: org.apache.mahout.common.distance.EuclideanDistanceMeasure

14/01/26 00:44:18 INFO kmeans.KMeansDriver: convergence: 0.5max Iterations: 10

14/01/26 00:44:18 INFO compress.CodecPool: Got brand-newdecompressor

14/01/26 00:44:19 WARN mapred.JobClient: UseGenericOptionsParser for parsing the arguments. Applications should implementTool for the same.

14/01/26 00:44:23 INFO input.FileInputFormat: Total inputpaths to process : 1

14/01/26 00:44:24 INFO mapred.JobClient: Running job:job_201401252331_0014

14/01/26 00:44:25 INFO mapred.JobClient: map 0% reduce 0%

14/01/26 00:44:48 INFO mapred.JobClient: map 100% reduce 0%

.404, 25.369, 21.068, 19.346, 20.055, 23.319, 24.743, 16.394,16.527, 25.255, 15.532, 23.677, 16.800, 16.444, 24.945, 14.802, 21.979, 17.191,23.474, 14.164, 24.928, 13.213, 22.669, 14.831, 17.453, 13.798, 22.499, 11.606,12.931, 15.505, 13.456, 14.295]

      1.0: [35.162,30.783, 33.848, 26.778, 32.632, 30.928, 33.958, 30.005, 32.792, 23.600, 32.514,31.969, 23.302, 22.740, 26.831, 25.599, 28.648, 23.295, 25.424, 23.333, 23.906,20.742, 21.021, 19.262, 19.733, 22.366, 24.415, 21.432, 21.119, 28.102, 21.169,26.818, 25.745, 24.934, 19.991, 22.085, 17.193, 20.809, 18.696, 15.019, 22.573,

      1.0: [35.899,26.672, 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979, 26.118,26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553, 21.452, 15.836,21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542, 25.766, 26.018, 20.820,24.959, 18.959, 23.346, 16.068, 22.836, 21.939, 25.722, 19.671, 26.299, 21.879,16.002, 15.288, 16.946, 17.534, 16.846, 16.546, 15.927, 18.084, 17.475]

14/01/26 00:39:32 INFO clustering.ClusterDumper: Wrote 6 clusters

14/01/26 00:39:32 INFO driver.MahoutDriver: Program took860026 ms (Minutes: 14.333766666666667)

[grid@h1 ~]$

7.测试贝叶斯分类器

1.下载数据集解压

http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz

cd data

[grid@h1 data]$ pwd

/usr/grid/data

[grid@h1 data]$ tar –zxvf 20news-bydate.tar.gz

………

20news-bydate-train/talk.religion.misc/84200

20news-bydate-train/talk.religion.misc/84131

20news-bydate-train/talk.religion.misc/84201

20news-bydate-train/talk.religion.misc/84101

20news-bydate-train/talk.religion.misc/84202

20news-bydate-train/talk.religion.misc/84203

2.建立训练集

数据准备

20news-bydate.tar.gz解压,并将20news-bydate中的所有子文夹中的内容复制到20news-all中,执行如下命令:

[grid@h1 data]$ hadoop fs -mkdir 20news-all

[grid@h1 data]$ hadooop fs -ls 20news-all

[grid@h1 data]$ ll

总用量 14140

-rwxrw-rw-. 1 gridgrid 14464277  1月 2609:09 20news-bydate.tar.gz

drwxr-xr-x. 22 grid grid    4096  3月 182003 20news-bydate-test

drwxr-xr-x. 22 grid grid    4096  3月 182003 20news-bydate-train

drwxrwxr-x. 6 gridgrid     4096  1月 26 10:41 tmp

drwxrwxr-x. 22 grid grid 4096  1月 27 00:05 20news-all

drwxrwxr-x. 4 grid grid 4096  1月 27 00:04 20news-bydate

[grid@h1 mahout-work-grid]$ hadoop fs -put 20news-all /usr/grid/mahout-work-grid/20news-all

Warning: $HADOOP_HOME is deprecated.

[grid@h1 mahout-work-grid]$ cd $MAHOUT_HOME

[grid@h1 mahout-distribution-0.8]$./examples/bin/classify-20newsgroups.sh

Please select a number to choose thecorresponding task to run

1. cnaivebayes

2. naivebayes

3. sgd

4. clean -- cleans up the work area in/usr/grid/mahout-work-grid

Enter your choice : 2

ok. You chose 2 and we'll use naivebayes

creating work directory at/usr/grid/mahout-work-grid

+ echo 'Preparing 20newsgroups data'

Preparing 20newsgroups data

+ rm -rf/usr/grid/mahout-work-grid/20news-all

+ mkdir/usr/grid/mahout-work-grid/20news-all

+ cp -R /usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/alt.atheism/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.graphics/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.sys.mac.hardware/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/comp.windows.x/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/misc.forsale/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/rec.autos/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/rec.motorcycles/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/rec.sport.baseball/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/rec.sport.hockey/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/sci.crypt/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/sci.electronics/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/sci.med/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/sci.space/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/soc.religion.christian/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.guns/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.mideast/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-test/talk.religion.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/alt.atheism/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.graphics/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.sys.mac.hardware/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/comp.windows.x/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/misc.forsale/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/rec.autos/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/rec.motorcycles/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/rec.sport.baseball/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/rec.sport.hockey/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/sci.crypt/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/sci.electronics/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/sci.med/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/sci.space/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/soc.religion.christian/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.guns/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.mideast/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.misc/usr/grid/mahout-work-grid/20news-bydate/20news-bydate-train/talk.religion.misc/usr/grid/mahout-work-grid/20news-all

+ echo 'Creating sequence files from20newsgroups data'

Creating sequence files from 20newsgroupsdata

+ ./bin/mahout seqdirectory -i/usr/grid/mahout-work-grid/20news-all -o /usr/grid/mahout-work-grid/20news-seq-ow

MAHOUT_LOCAL is not set; addingHADOOP_CONF_DIR to classpath.

Warning: $HADOOP_HOME is deprecated.

 

Running on hadoop, using/usr/grid/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/grid/hadoop/conf

MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar

Warning: $HADOOP_HOME is deprecated.

 

14/01/27 00:25:06 INFO common.AbstractJob:Command line arguments: {--charset=[UTF-8], --chunkSize=[64],--endPhase=[2147483647],--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],--input=[/usr/grid/mahout-work-grid/20news-all], --keyPrefix=[],--method=[mapreduce], --output=[/usr/grid/mahout-work-grid/20news-seq],--overwrite=null, --startPhase=[0], --tempDir=[temp]}

14/01/27 00:25:34 INFOinput.FileInputFormat: Total input paths to process : 18846

14/01/27 00:25:35 INFOutil.NativeCodeLoader: Loaded the native-hadoop library

14/01/27 00:25:35 WARN snappy.LoadSnappy:Snappy native library not loaded

14/01/27 00:25:53 INFO mapred.JobClient:Running job: job_201401261418_0001

14/01/27 00:25:54 INFO mapred.JobClient:  map 0% reduce 0%

14/01/27 00:26:41 INFOmapred.JobClient:  map 1% reduce 0%

14/01/27 00:26:44 INFOmapred.JobClient:  map 2% reduce 0%

14/01/27 00:26:47 INFOmapred.JobClient:  map 4% reduce 0%

14/01/27 00:26:50 INFOmapred.JobClient:  map 5% reduce 0%

14/01/27 00:26:53 INFOmapred.JobClient:  map 7% reduce 0%

14/01/27 00:26:56 INFOmapred.JobClient:  map 8% reduce 0%

14/01/27 00:26:59 INFOmapred.JobClient:  map 9% reduce 0%

14/01/27 00:27:05 INFOmapred.JobClient:  map 10% reduce 0%

14/01/27 00:27:08 INFOmapred.JobClient:  map 12% reduce 0%

14/01/27 00:27:11 INFOmapred.JobClient:  map 13% reduce 0%

14/01/27 00:27:14 INFOmapred.JobClient:  map 14% reduce 0%

14/01/27 00:27:17 INFOmapred.JobClient:  map 17% reduce 0%

14/01/27 00:27:20 INFOmapred.JobClient:  map 18% reduce 0%

14/01/27 00:27:23 INFOmapred.JobClient:  map 19% reduce 0%

14/01/27 00:27:26 INFOmapred.JobClient:  map 21% reduce 0%

14/01/27 00:27:29 INFOmapred.JobClient:  map 22% reduce 0%

14/01/27 00:27:32 INFO mapred.JobClient:  map 25% reduce 0%

14/01/27 00:27:35 INFOmapred.JobClient:  map 26% reduce 0%

14/01/27 00:27:38 INFOmapred.JobClient:  map 28% reduce 0%

14/01/27 00:27:41 INFOmapred.JobClient:  map 30% reduce 0%

14/01/27 00:27:44 INFOmapred.JobClient:  map 31% reduce 0%

14/01/27 00:27:47 INFOmapred.JobClient:  map 32% reduce 0%

14/01/27 00:27:50 INFOmapred.JobClient:  map 34% reduce 0%

14/01/27 00:27:53 INFOmapred.JobClient:  map 35% reduce 0%

14/01/27 00:27:56 INFOmapred.JobClient:  map 36% reduce 0%

14/01/27 00:27:59 INFOmapred.JobClient:  map 38% reduce 0%

14/01/27 00:28:05 INFOmapred.JobClient:  map 40% reduce 0%

14/01/27 00:28:08 INFOmapred.JobClient:  map 41% reduce 0%

14/01/27 00:28:11 INFOmapred.JobClient:  map 42% reduce 0%

14/01/27 00:28:17 INFO mapred.JobClient:  map 43% reduce 0%

14/01/27 00:28:24 INFOmapred.JobClient:  map 45% reduce 0%

14/01/27 00:28:30 INFOmapred.JobClient:  map 46% reduce 0%

14/01/27 00:28:45 INFOmapred.JobClient:  map 47% reduce 0%

14/01/27 00:28:54 INFOmapred.JobClient:  map 49% reduce 0%

14/01/27 00:29:00 INFOmapred.JobClient:  map 51% reduce 0%

14/01/27 00:29:03 INFOmapred.JobClient:  map 52% reduce 0%

14/01/27 00:29:06 INFOmapred.JobClient:  map 54% reduce 0%

14/01/27 00:29:08 INFOmapred.JobClient:  map 56% reduce 0%

14/01/27 00:29:11 INFOmapred.JobClient:  map 57% reduce 0%

14/01/27 00:29:14 INFOmapred.JobClient:  map 58% reduce 0%

14/01/27 00:29:20 INFOmapred.JobClient:  map 59% reduce 0%

14/01/27 00:29:23 INFOmapred.JobClient:  map 61% reduce 0%

14/01/27 00:29:26 INFOmapred.JobClient:  map 63% reduce 0%

14/01/27 00:29:29 INFOmapred.JobClient:  map 65% reduce 0%

14/01/27 00:29:32 INFOmapred.JobClient:  map 66% reduce 0%

14/01/27 00:29:35 INFOmapred.JobClient:  map 68% reduce 0%

14/01/27 00:29:38 INFO mapred.JobClient:  map 70% reduce 0%

14/01/27 00:29:41 INFOmapred.JobClient:  map 72% reduce 0%

14/01/27 00:29:44 INFOmapred.JobClient:  map 74% reduce 0%

14/01/27 00:29:47 INFOmapred.JobClient:  map 76% reduce 0%

14/01/27 00:29:50 INFOmapred.JobClient:  map 79% reduce 0%

14/01/27 00:29:53 INFOmapred.JobClient:  map 81% reduce 0%

14/01/27 00:29:56 INFOmapred.JobClient:  map 84% reduce 0%

14/01/27 00:29:59 INFOmapred.JobClient:  map 86% reduce 0%

14/01/27 00:30:02 INFOmapred.JobClient:  map 88% reduce 0%

14/01/27 00:30:05 INFOmapred.JobClient:  map 91% reduce 0%

14/01/27 00:30:08 INFOmapred.JobClient:  map 94% reduce 0%

14/01/27 00:30:14 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:30:23 INFO mapred.JobClient:Job complete: job_201401261418_0001

14/01/27 00:30:23 INFO mapred.JobClient:Counters: 18

14/01/27 00:30:23 INFOmapred.JobClient:   Job Counters

14/01/27 00:30:23 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=237675

14/01/27 00:30:23 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:30:23 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:30:23 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:31:27 INFO mapred.JobClient:     Total time spent by all reduces waitingafter reserving slots (ms)=0

14/01/27 00:31:27 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:31:27 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:31:27 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:31:27 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=0

14/01/27 00:31:27 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:31:27 INFOmapred.JobClient:     Bytes Written=27503580

14/01/27 00:31:27 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:31:27 INFOmapred.JobClient:    HDFS_BYTES_READ=19202520

14/01/27 00:31:27 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=21795

14/01/27 00:31:27 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=27503580

14/01/27 00:31:27 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:31:27 INFOmapred.JobClient:     Bytes Read=19202391

14/01/27 00:31:27 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:31:27 INFOmapred.JobClient:     Map inputrecords=18846

14/01/27 00:31:27 INFOmapred.JobClient:     Physical memory(bytes) snapshot=41730048

14/01/27 00:31:27 INFOmapred.JobClient:     Spilled Records=0

14/01/27 00:31:27 INFOmapred.JobClient:     CPU time spent(ms)=10280

14/01/27 00:31:27 INFOmapred.JobClient:     Total committedheap usage (bytes)=8060928

14/01/27 00:31:27 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=1744265216

14/01/27 00:31:27 INFOmapred.JobClient:     Map outputrecords=18846

14/01/27 00:31:27 INFOmapred.JobClient:     SPLIT_RAW_BYTES=129

14/01/27 00:31:27 INFOvectorizer.SparseVectorsFromSequenceFiles: Creating Term Frequency Vectors

14/01/27 00:31:27 INFOvectorizer.DictionaryVectorizer: Creating dictionary from /usr/grid/mahout-work-grid/20news-vectors/tokenized-documentsand saving at /usr/grid/mahout-work-grid/20news-vectors/wordcount

14/01/27 00:31:31 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:31:32 INFO mapred.JobClient:Running job: job_201401261418_0003

14/01/27 00:31:33 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:32:46 INFOmapred.JobClient:  map 4% reduce 0%

14/01/27 00:32:49 INFOmapred.JobClient:  map 20% reduce 0%

14/01/27 00:32:52 INFOmapred.JobClient:  map 41% reduce 0%

14/01/27 00:32:55 INFOmapred.JobClient:  map 66% reduce 0%

14/01/27 00:32:58 INFOmapred.JobClient:  map 90% reduce 0%

14/01/27 00:33:01 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:33:22 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:33:27 INFO mapred.JobClient:Job complete: job_201401261418_0003

14/01/27 00:33:27 INFO mapred.JobClient:Counters: 29

14/01/27 00:33:27 INFOmapred.JobClient:   Job Counters

14/01/27 00:33:27 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:33:27 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=50887

14/01/27 00:33:27 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:33:27 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:33:27 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:33:27 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:33:27 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=20520

14/01/27 00:33:27 INFO mapred.JobClient:   File Output Format Counters

14/01/27 00:33:27 INFOmapred.JobClient:     BytesWritten=2315037

14/01/27 00:33:27 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:33:27 INFOmapred.JobClient:    FILE_BYTES_READ=11857906

14/01/27 00:33:27 INFOmapred.JobClient:    HDFS_BYTES_READ=27503733

14/01/27 00:33:27 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=15440177

14/01/27 00:33:27 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=2315037

14/01/27 00:33:27 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:33:27 INFOmapred.JobClient:     Bytes Read=27503580

14/01/27 00:33:27 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:33:27 INFOmapred.JobClient:     Map outputmaterialized bytes=3538084

14/01/27 00:33:27 INFO mapred.JobClient:     Map input records=18846

14/01/27 00:33:27 INFOmapred.JobClient:     Reduce shufflebytes=0

14/01/27 00:33:27 INFOmapred.JobClient:     SpilledRecords=849345

14/01/27 00:33:27 INFOmapred.JobClient:     Map outputbytes=39462740

14/01/27 00:33:27 INFOmapred.JobClient:     CPU time spent(ms)=20780

14/01/27 00:33:27 INFOmapred.JobClient:     Total committedheap usage (bytes)=264306688

14/01/27 00:33:27 INFOmapred.JobClient:     Combine inputrecords=3026242

14/01/27 00:33:27 INFO mapred.JobClient:     SPLIT_RAW_BYTES=153

14/01/27 00:33:27 INFOmapred.JobClient:     Reduce inputrecords=192904

14/01/27 00:33:27 INFOmapred.JobClient:     Reduce inputgroups=192904

14/01/27 00:33:27 INFOmapred.JobClient:     Combine outputrecords=554873

14/01/27 00:33:27 INFOmapred.JobClient:     Physical memory(bytes) snapshot=253255680

14/01/27 00:33:27 INFOmapred.JobClient:     Reduce outputrecords=93563

14/01/27 00:33:27 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3491995648

14/01/27 00:33:27 INFOmapred.JobClient:     Map outputrecords=2664273

14/01/27 00:33:31 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:33:32 INFO mapred.JobClient:Running job: job_201401261418_0004

14/01/27 00:33:33 INFO mapred.JobClient:  map 0% reduce 0%

14/01/27 00:34:35 INFOmapred.JobClient:  map 36% reduce 0%

14/01/27 00:34:38 INFOmapred.JobClient:  map 93% reduce 0%

14/01/27 00:34:41 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:35:02 INFOmapred.JobClient:  map 100% reduce 73%

14/01/27 00:35:05 INFOmapred.JobClient:  map 100% reduce 84%

14/01/27 00:35:11 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:35:16 INFO mapred.JobClient:Job complete: job_201401261418_0004

14/01/27 00:35:16 INFO mapred.JobClient:Counters: 29

14/01/27 00:35:16 INFOmapred.JobClient:   Job Counters

14/01/27 00:35:16 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:35:16 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=25499

14/01/27 00:35:16 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:35:16 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:35:16 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:35:16 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:35:16 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=27870

14/01/27 00:35:16 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:35:16 INFOmapred.JobClient:     Bytes Written=29314118

14/01/27 00:35:16 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:35:16 INFOmapred.JobClient:    FILE_BYTES_READ=29226519

14/01/27 00:35:16 INFOmapred.JobClient:    HDFS_BYTES_READ=27503733

14/01/27 00:35:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=54594825

14/01/27 00:35:16 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=29314118

14/01/27 00:35:16 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:35:16 INFOmapred.JobClient:     Bytes Read=27503580

14/01/27 00:35:16 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:35:16 INFOmapred.JobClient:     Map outputmaterialized bytes=27274291

14/01/27 00:35:16 INFOmapred.JobClient:     Map inputrecords=18846

14/01/27 00:35:16 INFOmapred.JobClient:     Reduce shufflebytes=27274291

14/01/27 00:35:16 INFOmapred.JobClient:     SpilledRecords=37692

14/01/27 00:35:16 INFOmapred.JobClient:     Map outputbytes=27199343

14/01/27 00:35:16 INFOmapred.JobClient:     CPU time spent(ms)=18110

14/01/27 00:35:16 INFO mapred.JobClient:     Total committed heap usage(bytes)=304148480

14/01/27 00:35:16 INFOmapred.JobClient:     Combine inputrecords=0

14/01/27 00:35:16 INFOmapred.JobClient:     SPLIT_RAW_BYTES=153

14/01/27 00:35:16 INFOmapred.JobClient:     Reduce inputrecords=18846

14/01/27 00:35:16 INFOmapred.JobClient:     Reduce inputgroups=18846

14/01/27 00:35:16 INFOmapred.JobClient:     Combine outputrecords=0

14/01/27 00:35:16 INFOmapred.JobClient:     Physical memory(bytes) snapshot=298504192

14/01/27 00:35:16 INFOmapred.JobClient:     Reduce outputrecords=18846

14/01/27 00:35:16 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3491328000

14/01/27 00:35:16 INFOmapred.JobClient:     Map outputrecords=18846

14/01/27 00:35:23 INFO input.FileInputFormat:Total input paths to process : 1

14/01/27 00:35:24 INFO mapred.JobClient:Running job: job_201401261418_0005

14/01/27 00:35:25 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:35:53 INFOmapred.JobClient:  map 20% reduce 0%

14/01/27 00:35:56 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:36:14 INFOmapred.JobClient:  map 100% reduce 84%

14/01/27 00:36:20 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:36:25 INFO mapred.JobClient:Job complete: job_201401261418_0005

14/01/27 00:36:25 INFO mapred.JobClient:Counters: 29

14/01/27 00:36:25 INFOmapred.JobClient:   Job Counters

14/01/27 00:36:25 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:36:25 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=20908

14/01/27 00:36:25 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:36:25 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:36:25 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:36:25 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:36:25 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=23135

14/01/27 00:36:25 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:36:25 INFO mapred.JobClient:     Bytes Written=29314118

14/01/27 00:36:25 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:36:25 INFOmapred.JobClient:    FILE_BYTES_READ=29059398

14/01/27 00:36:25 INFOmapred.JobClient:    HDFS_BYTES_READ=29314269

14/01/27 00:36:25 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=58163213

14/01/27 00:36:25 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=29314118

14/01/27 00:36:25 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:36:25 INFOmapred.JobClient:     Bytes Read=29314118

14/01/27 00:36:25 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:36:25 INFOmapred.JobClient:     Map outputmaterialized bytes=29059398

14/01/27 00:36:25 INFOmapred.JobClient:     Map inputrecords=18846

14/01/27 00:36:25 INFO mapred.JobClient:     Reduce shuffle bytes=0

14/01/27 00:36:25 INFOmapred.JobClient:     SpilledRecords=37692

14/01/27 00:36:25 INFOmapred.JobClient:     Map outputbytes=28984080

14/01/27 00:36:25 INFOmapred.JobClient:     CPU time spent(ms)=10800

14/01/27 00:36:25 INFOmapred.JobClient:     Total committedheap usage (bytes)=293572608

14/01/27 00:36:25 INFOmapred.JobClient:     Combine inputrecords=0

14/01/27 00:36:25 INFOmapred.JobClient:     SPLIT_RAW_BYTES=151

14/01/27 00:36:25 INFO mapred.JobClient:     Reduce input records=18846

14/01/27 00:36:25 INFOmapred.JobClient:     Reduce inputgroups=18846

14/01/27 00:36:25 INFOmapred.JobClient:     Combine outputrecords=0

14/01/27 00:36:25 INFOmapred.JobClient:     Physical memory(bytes) snapshot=282771456

14/01/27 00:36:25 INFOmapred.JobClient:     Reduce outputrecords=18846

14/01/27 00:36:25 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3491999744

14/01/27 00:36:25 INFOmapred.JobClient:     Map outputrecords=18846

14/01/27 00:36:25 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-vectors/partial-vectors-0

14/01/27 00:36:25 INFOvectorizer.SparseVectorsFromSequenceFiles: Calculating IDF

14/01/27 00:36:30 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:36:30 INFO mapred.JobClient:Running job: job_201401261418_0006

14/01/27 00:36:31 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:37:05 INFOmapred.JobClient:  map 14% reduce 0%

14/01/27 00:37:08 INFOmapred.JobClient:  map 49% reduce 0%

14/01/27 00:37:11 INFOmapred.JobClient:  map 83% reduce 0%

14/01/27 00:37:14 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:37:32 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:37:37 INFO mapred.JobClient:Job complete: job_201401261418_0006

14/01/27 00:37:37 INFO mapred.JobClient:Counters: 29

14/01/27 00:37:37 INFOmapred.JobClient:   Job Counters

14/01/27 00:37:37 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:37:37 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=30299

14/01/27 00:37:37 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:37:37 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:37:37 INFO mapred.JobClient:     Launched map tasks=1

14/01/27 00:37:37 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:37:37 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=16561

14/01/27 00:37:37 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:37:37 INFOmapred.JobClient:     BytesWritten=1890073

14/01/27 00:37:37 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:37:37 INFOmapred.JobClient:    FILE_BYTES_READ=4880816

14/01/27 00:37:37 INFOmapred.JobClient:    HDFS_BYTES_READ=29314270

14/01/27 00:37:37 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=6234855

14/01/27 00:37:37 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=1890073

14/01/27 00:37:37 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:37:37 INFOmapred.JobClient:     Bytes Read=29314118

14/01/27 00:37:37 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:37:37 INFOmapred.JobClient:     Map outputmaterialized bytes=1309902

14/01/27 00:37:37 INFOmapred.JobClient:     Map inputrecords=18846

14/01/27 00:37:37 INFOmapred.JobClient:     Reduce shufflebytes=1309902

14/01/27 00:37:37 INFOmapred.JobClient:     SpilledRecords=442189

14/01/27 00:37:37 INFOmapred.JobClient:     Map outputbytes=31005336

14/01/27 00:37:37 INFOmapred.JobClient:     CPU time spent (ms)=12610

14/01/27 00:37:37 INFOmapred.JobClient:     Total committedheap usage (bytes)=264384512

14/01/27 00:37:37 INFOmapred.JobClient:     Combine inputrecords=2838839

14/01/27 00:37:37 INFOmapred.JobClient:     SPLIT_RAW_BYTES=152

14/01/27 00:37:37 INFOmapred.JobClient:     Reduce inputrecords=93564

14/01/27 00:37:37 INFOmapred.JobClient:     Reduce inputgroups=93564

14/01/27 00:37:37 INFOmapred.JobClient:     Combine outputrecords=348625

14/01/27 00:37:37 INFOmapred.JobClient:     Physical memory(bytes) snapshot=249851904

14/01/27 00:37:37 INFOmapred.JobClient:     Reduce outputrecords=93564

14/01/27 00:37:37 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3491995648

14/01/27 00:37:37 INFOmapred.JobClient:     Map output records=2583778

14/01/27 00:37:38 INFOvectorizer.SparseVectorsFromSequenceFiles: Pruning

14/01/27 00:37:40 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:37:41 INFO mapred.JobClient:Running job: job_201401261418_0007

14/01/27 00:37:42 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:38:12 INFOmapred.JobClient:  map 38% reduce 0%

14/01/27 00:38:15 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:38:33 INFOmapred.JobClient:  map 100% reduce 66%

14/01/27 00:38:36 INFO mapred.JobClient:  map 100% reduce 85%

14/01/27 00:38:45 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:38:50 INFO mapred.JobClient:Job complete: job_201401261418_0007

14/01/27 00:38:50 INFO mapred.JobClient:Counters: 29

14/01/27 00:38:50 INFO mapred.JobClient:   Job Counters

14/01/27 00:38:50 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:38:50 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=26595

14/01/27 00:38:50 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:38:50 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:38:50 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:38:50 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:38:50 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=24054

14/01/27 00:38:50 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:38:50 INFOmapred.JobClient:     BytesWritten=28689283

14/01/27 00:38:50 INFO mapred.JobClient:   FileSystemCounters

14/01/27 00:38:50 INFOmapred.JobClient:    FILE_BYTES_READ=9597304

14/01/27 00:38:50 INFOmapred.JobClient:    HDFS_BYTES_READ=29314270

14/01/27 00:38:50 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=15430363

14/01/27 00:38:50 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=28689283

14/01/27 00:38:50 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:38:50 INFOmapred.JobClient:     Bytes Read=29314118

14/01/27 00:38:50 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:38:50 INFOmapred.JobClient:     Map outputmaterialized bytes=7692467

14/01/27 00:38:50 INFOmapred.JobClient:     Map inputrecords=18846

14/01/27 00:38:50 INFOmapred.JobClient:     Reduce shufflebytes=7692467

14/01/27 00:38:50 INFO mapred.JobClient:     Spilled Records=37692

14/01/27 00:38:50 INFOmapred.JobClient:     Map outputbytes=28984080

14/01/27 00:38:50 INFOmapred.JobClient:     CPU time spent(ms)=16230

14/01/27 00:38:50 INFOmapred.JobClient:     Total committedheap usage (bytes)=306655232

14/01/27 00:38:50 INFOmapred.JobClient:     Combine inputrecords=0

14/01/27 00:38:50 INFOmapred.JobClient:     SPLIT_RAW_BYTES=152

14/01/27 00:38:50 INFOmapred.JobClient:     Reduce inputrecords=18846

14/01/27 00:38:50 INFO mapred.JobClient:     Reduce input groups=18846

14/01/27 00:38:50 INFOmapred.JobClient:     Combine outputrecords=0

14/01/27 00:38:50 INFOmapred.JobClient:     Physical memory(bytes) snapshot=299081728

14/01/27 00:38:50 INFOmapred.JobClient:     Reduce output records=18846

14/01/27 00:38:50 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3492966400

14/01/27 00:38:50 INFOmapred.JobClient:     Map outputrecords=18846

14/01/27 00:38:54 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:38:55 INFO mapred.JobClient:Running job: job_201401261418_0008

14/01/27 00:38:56 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:39:24 INFOmapred.JobClient:  map 86% reduce 0%

14/01/27 00:39:27 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:39:45 INFOmapred.JobClient:  map 100% reduce 86%

14/01/27 00:39:51 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:39:56 INFO mapred.JobClient:Job complete: job_201401261418_0008

14/01/27 00:39:56 INFO mapred.JobClient:Counters: 29

14/01/27 00:39:56 INFOmapred.JobClient:   Job Counters

14/01/27 00:39:56 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:39:56 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=20854

14/01/27 00:39:56 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:39:56 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:39:56 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:39:56 INFO mapred.JobClient:     Data-local map tasks=1

14/01/27 00:39:56 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=23724

14/01/27 00:39:56 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:39:56 INFOmapred.JobClient:     BytesWritten=28689283

14/01/27 00:39:56 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:39:56 INFOmapred.JobClient:    FILE_BYTES_READ=28437750

14/01/27 00:39:56 INFOmapred.JobClient:    HDFS_BYTES_READ=28689445

14/01/27 00:39:56 INFOmapred.JobClient:     FILE_BYTES_WRITTEN=56919517

14/01/27 00:39:56 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=28689283

14/01/27 00:39:56 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:39:56 INFOmapred.JobClient:     Bytes Read=28689283

14/01/27 00:39:56 INFO mapred.JobClient:   Map-Reduce Framework

14/01/27 00:39:56 INFOmapred.JobClient:     Map outputmaterialized bytes=28437750

14/01/27 00:39:56 INFOmapred.JobClient:     Map inputrecords=18846

14/01/27 00:39:56 INFOmapred.JobClient:     Reduce shufflebytes=0

14/01/27 00:39:56 INFOmapred.JobClient:     SpilledRecords=37692

14/01/27 00:39:56 INFOmapred.JobClient:     Map outputbytes=28362505

14/01/27 00:39:56 INFOmapred.JobClient:     CPU time spent(ms)=10160

14/01/27 00:39:56 INFOmapred.JobClient:     Total committedheap usage (bytes)=292847616

14/01/27 00:39:56 INFOmapred.JobClient:     Combine inputrecords=0

14/01/27 00:39:56 INFOmapred.JobClient:     SPLIT_RAW_BYTES=162

14/01/27 00:39:56 INFOmapred.JobClient:     Reduce inputrecords=18846

14/01/27 00:39:56 INFOmapred.JobClient:     Reduce inputgroups=18846

14/01/27 00:39:56 INFOmapred.JobClient:     Combine outputrecords=0

14/01/27 00:39:56 INFOmapred.JobClient:     Physical memory(bytes) snapshot=282537984

14/01/27 00:39:56 INFOmapred.JobClient:     Reduce output records=18846

14/01/27 00:39:56 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3492261888

14/01/27 00:39:56 INFOmapred.JobClient:     Map outputrecords=18846

14/01/27 00:39:56 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-vectors/tf-vectors-partial

14/01/27 00:39:56 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-vectors/tf-vectors-toprune

14/01/27 00:40:00 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:40:00 INFO mapred.JobClient:Running job: job_201401261418_0009

14/01/27 00:40:01 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:40:33 INFOmapred.JobClient:  map 59% reduce 0%

14/01/27 00:40:36 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:40:54 INFOmapred.JobClient:  map 100% reduce 83%

14/01/27 00:41:00 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:41:05 INFO mapred.JobClient:Job complete: job_201401261418_0009

14/01/27 00:41:05 INFO mapred.JobClient:Counters: 29

14/01/27 00:41:05 INFOmapred.JobClient:   Job Counters

14/01/27 00:41:05 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:41:05 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=20673

14/01/27 00:41:05 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:41:05 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:41:05 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:41:05 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:41:05 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=22591

14/01/27 00:41:05 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:41:05 INFOmapred.JobClient:     BytesWritten=28689283

14/01/27 00:41:05 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:41:05 INFOmapred.JobClient:    FILE_BYTES_READ=30342579

14/01/27 00:41:05 INFOmapred.JobClient:    HDFS_BYTES_READ=28689427

14/01/27 00:41:05 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=56921481

14/01/27 00:41:05 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=28689283

14/01/27 00:41:05 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:41:05 INFOmapred.JobClient:     Bytes Read=28689283

14/01/27 00:41:05 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:41:05 INFOmapred.JobClient:     Map outputmaterialized bytes=28437750

14/01/27 00:41:05 INFOmapred.JobClient:     Map inputrecords=18846

14/01/27 00:41:05 INFOmapred.JobClient:     Reduce shufflebytes=28437750

14/01/27 00:41:05 INFOmapred.JobClient:     SpilledRecords=37692

14/01/27 00:41:05 INFOmapred.JobClient:     Map outputbytes=28362505

14/01/27 00:41:05 INFOmapred.JobClient:     CPU time spent(ms)=10990

14/01/27 00:41:05 INFOmapred.JobClient:     Total committed heapusage (bytes)=305639424

14/01/27 00:41:05 INFOmapred.JobClient:     Combine inputrecords=0

14/01/27 00:41:05 INFOmapred.JobClient:     SPLIT_RAW_BYTES=144

14/01/27 00:41:05 INFOmapred.JobClient:     Reduce inputrecords=18846

14/01/27 00:41:05 INFOmapred.JobClient:     Reduce inputgroups=18846

14/01/27 00:41:05 INFOmapred.JobClient:     Combine outputrecords=0

14/01/27 00:41:05 INFOmapred.JobClient:     Physical memory(bytes) snapshot=296620032

14/01/27 00:41:05 INFOmapred.JobClient:     Reduce outputrecords=18846

14/01/27 00:41:05 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3492110336

14/01/27 00:41:05 INFOmapred.JobClient:     Map outputrecords=18846

14/01/27 00:41:08 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:41:08 INFO mapred.JobClient:Running job: job_201401261418_0010

14/01/27 00:41:09 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:41:47 INFOmapred.JobClient:  map 16% reduce 0%

14/01/27 00:41:53 INFOmapred.JobClient:  map 21% reduce 0%

14/01/27 00:41:56 INFOmapred.JobClient:  map 28% reduce 0%

14/01/27 00:41:58 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:42:19 INFOmapred.JobClient:  map 100% reduce 85%

14/01/27 00:42:25 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:42:31 INFO mapred.JobClient:Job complete: job_201401261418_0010

14/01/27 00:42:31 INFO mapred.JobClient:Counters: 29

14/01/27 00:42:31 INFOmapred.JobClient:   Job Counters

14/01/27 00:42:31 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:42:31 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=41166

14/01/27 00:42:31 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:42:31 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:42:31 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:42:31 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:42:31 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=22239

14/01/27 00:42:31 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:42:31 INFOmapred.JobClient:     BytesWritten=28689283

14/01/27 00:42:31 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:42:31 INFOmapred.JobClient:    FILE_BYTES_READ=28437750

14/01/27 00:42:31 INFOmapred.JobClient:    HDFS_BYTES_READ=28689434

14/01/27 00:42:31 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=56919905

14/01/27 00:42:31 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=28689283

14/01/27 00:42:31 INFO mapred.JobClient:   File Input Format Counters

14/01/27 00:42:31 INFOmapred.JobClient:     Bytes Read=28689283

14/01/27 00:42:31 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:42:31 INFOmapred.JobClient:     Map outputmaterialized bytes=28437750

14/01/27 00:42:31 INFOmapred.JobClient:     Map inputrecords=18846

14/01/27 00:42:31 INFOmapred.JobClient:     Reduce shufflebytes=28437750

14/01/27 00:42:31 INFOmapred.JobClient:     SpilledRecords=37692

14/01/27 00:42:31 INFOmapred.JobClient:     Map output bytes=28362505

14/01/27 00:42:31 INFOmapred.JobClient:     CPU time spent(ms)=11410

14/01/27 00:42:31 INFOmapred.JobClient:     Total committedheap usage (bytes)=292954112

14/01/27 00:42:31 INFOmapred.JobClient:     Combine inputrecords=0

14/01/27 00:42:31 INFOmapred.JobClient:     SPLIT_RAW_BYTES=151

14/01/27 00:42:31 INFOmapred.JobClient:     Reduce inputrecords=18846

14/01/27 00:42:31 INFOmapred.JobClient:     Reduce inputgroups=18846

14/01/27 00:42:31 INFOmapred.JobClient:     Combine output records=0

14/01/27 00:42:31 INFOmapred.JobClient:     Physical memory(bytes) snapshot=281976832

14/01/27 00:42:31 INFOmapred.JobClient:     Reduce outputrecords=18846

14/01/27 00:42:31 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3491901440

14/01/27 00:42:31 INFOmapred.JobClient:     Map outputrecords=18846

14/01/27 00:42:31 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-vectors/partial-vectors-0

14/01/27 00:42:31 INFO driver.MahoutDriver:Program took 715875 ms (Minutes: 11.93125)

+ echo 'Creating training and holdout setwith a random 80-20 split of the generated vector dataset'

Creating training and holdout set with arandom 80-20 split of the generated vector dataset

+ ./bin/mahout split -i/usr/grid/mahout-work-grid/20news-vectors/tfidf-vectors --trainingOutput/usr/grid/mahout-work-grid/20news-train-vectors --testOutput/usr/grid/mahout-work-grid/20news-test-vectors --randomSelectionPct 40--overwrite --sequenceFiles -xm sequential

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIRto classpath.

Warning: $HADOOP_HOME is deprecated.

 

Running on hadoop, using/usr/grid/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/grid/hadoop/conf

MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar

Warning: $HADOOP_HOME is deprecated.

 

14/01/27 00:42:43 WARN driver.MahoutDriver:No split.props found on classpath, will use command-line arguments only

14/01/27 00:42:44 INFO common.AbstractJob:Command line arguments: {--endPhase=[2147483647], --input=[/usr/grid/mahout-work-grid/20news-vectors/tfidf-vectors],--method=[sequential], --overwrite=null, --randomSelectionPct=[40],--sequenceFiles=null, --startPhase=[0], --tempDir=[temp],--testOutput=[/usr/grid/mahout-work-grid/20news-test-vectors],--trainingOutput=[/usr/grid/mahout-work-grid/20news-train-vectors]}

14/01/27 00:42:48 INFO utils.SplitInput:part-r-00000 has 162419 lines

14/01/27 00:42:48 INFO utils.SplitInput:part-r-00000 test split size is 64968 based on random selection percentage 40

14/01/27 00:42:48 INFO util.NativeCodeLoader:Loaded the native-hadoop library

14/01/27 00:42:48 INFO zlib.ZlibFactory:Successfully loaded & initialized native-zlib library

14/01/27 00:42:48 INFO compress.CodecPool:Got brand-new compressor

14/01/27 00:42:48 INFO compress.CodecPool:Got brand-new compressor

14/01/27 00:43:01 INFO utils.SplitInput:file: part-r-00000, input: 162419 train: 11205, test: 7641 starting at 0

14/01/27 00:43:01 INFO driver.MahoutDriver:Program took 17995 ms (Minutes: 0.29991666666666666)

+ echo 'Training Naive Bayes model'

Training Naive Bayes model

+ ./bin/mahout trainnb -i/usr/grid/mahout-work-grid/20news-train-vectors -el -o/usr/grid/mahout-work-grid/model -li /usr/grid/mahout-work-grid/labelindex -ow

MAHOUT_LOCAL is not set; addingHADOOP_CONF_DIR to classpath.

Warning: $HADOOP_HOME is deprecated.

 

Running on hadoop, using/usr/grid/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/grid/hadoop/conf

MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar

Warning: $HADOOP_HOME is deprecated.

 

14/01/27 00:43:13 WARN driver.MahoutDriver:No trainnb.props found on classpath, will use command-line arguments only

14/01/27 00:43:14 INFO common.AbstractJob:Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647],--extractLabels=null, --input=[/usr/grid/mahout-work-grid/20news-train-vectors],--labelIndex=[/usr/grid/mahout-work-grid/labelindex],--output=[/usr/grid/mahout-work-grid/model], --overwrite=null,--startPhase=[0], --tempDir=[temp]}

14/01/27 00:43:14 INFO common.HadoopUtil:Deleting temp

14/01/27 00:43:14 INFOutil.NativeCodeLoader: Loaded the native-hadoop library

14/01/27 00:43:14 INFO zlib.ZlibFactory:Successfully loaded & initialized native-zlib library

14/01/27 00:43:14 INFO compress.CodecPool:Got brand-new decompressor

14/01/27 00:43:21 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:43:22 INFO mapred.JobClient:Running job: job_201401261418_0011

14/01/27 00:43:23 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:43:56 INFOmapred.JobClient:  map 39% reduce 0%

14/01/27 00:43:59 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:44:20 INFOmapred.JobClient:  map 100% reduce 100%

14/01/27 00:44:25 INFO mapred.JobClient:Job complete: job_201401261418_0011

14/01/27 00:44:25 INFO mapred.JobClient: Counters:29

14/01/27 00:44:25 INFOmapred.JobClient:   Job Counters

14/01/27 00:44:25 INFOmapred.JobClient:     Launched reducetasks=1

14/01/27 00:44:25 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=27091

14/01/27 00:44:25 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:44:25 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:44:25 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:44:25 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:44:25 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=18002

14/01/27 00:44:25 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:44:25 INFOmapred.JobClient:     Bytes Written=2727579

14/01/27 00:44:25 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:44:25 INFOmapred.JobClient:    FILE_BYTES_READ=1409402

14/01/27 00:44:25 INFOmapred.JobClient:    HDFS_BYTES_READ=12578676

14/01/27 00:44:25 INFOmapred.JobClient:     FILE_BYTES_WRITTEN=2862959

14/01/27 00:44:25 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=2727579

14/01/27 00:44:25 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:44:25 INFOmapred.JobClient:     Bytes Read=12578537

14/01/27 00:44:25 INFO mapred.JobClient:   Map-Reduce Framework

14/01/27 00:44:25 INFOmapred.JobClient:     Map outputmaterialized bytes=1408720

14/01/27 00:44:25 INFOmapred.JobClient:     Map inputrecords=11205

14/01/27 00:44:25 INFOmapred.JobClient:     Reduce shuffle bytes=1408720

14/01/27 00:44:25 INFOmapred.JobClient:     Spilled Records=40

14/01/27 00:44:25 INFOmapred.JobClient:     Map outputbytes=16592779

14/01/27 00:44:25 INFOmapred.JobClient:     CPU time spent(ms)=9950

14/01/27 00:44:25 INFO mapred.JobClient:     Total committed heap usage(bytes)=264400896

14/01/27 00:44:25 INFOmapred.JobClient:     Combine inputrecords=11205

14/01/27 00:44:25 INFOmapred.JobClient:     SPLIT_RAW_BYTES=139

14/01/27 00:44:25 INFOmapred.JobClient:     Reduce inputrecords=20

14/01/27 00:44:25 INFOmapred.JobClient:     Reduce inputgroups=20

14/01/27 00:44:25 INFOmapred.JobClient:     Combine outputrecords=20

14/01/27 00:44:25 INFOmapred.JobClient:     Physical memory(bytes) snapshot=258179072

14/01/27 00:44:25 INFO mapred.JobClient:     Reduce output records=20

14/01/27 00:44:25 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3493203968

14/01/27 00:44:25 INFOmapred.JobClient:     Map outputrecords=11205

14/01/27 00:44:28 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:44:28 INFO mapred.JobClient:Running job: job_201401261418_0012

14/01/27 00:44:29 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:44:56 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:45:11 INFO mapred.JobClient:  map 100% reduce 100%

14/01/27 00:45:16 INFO mapred.JobClient:Job complete: job_201401261418_0012

14/01/27 00:45:16 INFO mapred.JobClient:Counters: 29

14/01/27 00:45:16 INFOmapred.JobClient:   Job Counters

14/01/27 00:45:16 INFOmapred.JobClient:     Launched reduce tasks=1

14/01/27 00:45:16 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=14633

14/01/27 00:45:16 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:45:16 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/01/27 00:45:16 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:45:16 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:45:16 INFOmapred.JobClient:     SLOTS_MILLIS_REDUCES=13687

14/01/27 00:45:16 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:45:16 INFOmapred.JobClient:     BytesWritten=902324

14/01/27 00:45:16 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:45:16 INFO mapred.JobClient:     FILE_BYTES_READ=365663

14/01/27 00:45:16 INFOmapred.JobClient:    HDFS_BYTES_READ=2727705

14/01/27 00:45:16 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=776951

14/01/27 00:45:16 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=902324

14/01/27 00:45:16 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:45:16 INFOmapred.JobClient:     Bytes Read=2727579

14/01/27 00:45:16 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:45:16 INFOmapred.JobClient:     Map outputmaterialized bytes=365655

14/01/27 00:45:16 INFOmapred.JobClient:     Map inputrecords=20

14/01/27 00:45:16 INFOmapred.JobClient:     Reduce shufflebytes=365655

14/01/27 00:45:16 INFOmapred.JobClient:     Spilled Records=4

14/01/27 00:45:16 INFOmapred.JobClient:     Map outputbytes=902198

14/01/27 00:45:16 INFOmapred.JobClient:     CPU time spent(ms)=3740

14/01/27 00:45:16 INFOmapred.JobClient:     Total committedheap usage (bytes)=272609280

14/01/27 00:45:16 INFOmapred.JobClient:     Combine inputrecords=2

14/01/27 00:45:16 INFOmapred.JobClient:     SPLIT_RAW_BYTES=126

14/01/27 00:45:16 INFOmapred.JobClient:     Reduce inputrecords=2

14/01/27 00:45:16 INFOmapred.JobClient:     Reduce inputgroups=2

14/01/27 00:45:16 INFOmapred.JobClient:     Combine output records=2

14/01/27 00:45:16 INFOmapred.JobClient:     Physical memory(bytes) snapshot=233705472

14/01/27 00:45:16 INFOmapred.JobClient:     Reduce outputrecords=2

14/01/27 00:45:16 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3492974592

14/01/27 00:45:16 INFOmapred.JobClient:     Map outputrecords=2

14/01/27 00:45:17 INFO driver.MahoutDriver:Program took 124098 ms (Minutes: 2.0683)

+ echo 'Self testing on training set'

Self testing on training set

+ ./bin/mahout testnb -i /usr/grid/mahout-work-grid/20news-train-vectors-m /usr/grid/mahout-work-grid/model -l /usr/grid/mahout-work-grid/labelindex-ow -o /usr/grid/mahout-work-grid/20news-testing

MAHOUT_LOCAL is not set; addingHADOOP_CONF_DIR to classpath.

Warning: $HADOOP_HOME is deprecated.

 

Running on hadoop, using/usr/grid/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/grid/hadoop/conf

MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar

Warning: $HADOOP_HOME is deprecated.

 

14/01/27 00:45:28 WARN driver.MahoutDriver:No testnb.props found on classpath, will use command-line arguments only

14/01/27 00:45:28 INFO common.AbstractJob:Command line arguments: {--endPhase=[2147483647],--input=[/usr/grid/mahout-work-grid/20news-train-vectors],--labelIndex=[/usr/grid/mahout-work-grid/labelindex],--model=[/usr/grid/mahout-work-grid/model],--output=[/usr/grid/mahout-work-grid/20news-testing], --overwrite=null,--startPhase=[0], --tempDir=[temp]}

14/01/27 00:45:31 INFOinput.FileInputFormat: Total input paths to process : 1

14/01/27 00:45:31 INFO mapred.JobClient:Running job: job_201401261418_0013

14/01/27 00:45:33 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:45:59 INFOmapred.JobClient:  map 36% reduce 0%

14/01/27 00:46:02 INFOmapred.JobClient:  map 61% reduce 0%

14/01/27 00:46:05 INFOmapred.JobClient:  map 84% reduce 0%

14/01/27 00:46:11 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:46:16 INFO mapred.JobClient:Job complete: job_201401261418_0013

14/01/27 00:46:16 INFO mapred.JobClient:Counters: 20

14/01/27 00:46:16 INFOmapred.JobClient:   Job Counters

14/01/27 00:46:16 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=29672

14/01/27 00:46:16 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:46:16 INFO mapred.JobClient:     Total time spent by all maps waiting afterreserving slots (ms)=0

14/01/27 00:46:16 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:46:16 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:46:16 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

14/01/27 00:46:16 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:46:16 INFOmapred.JobClient:     BytesWritten=2110958

14/01/27 00:46:16 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:46:16 INFO mapred.JobClient:     FILE_BYTES_READ=3657578

14/01/27 00:46:16 INFOmapred.JobClient:    HDFS_BYTES_READ=12578676

14/01/27 00:46:16 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=22400

14/01/27 00:46:16 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=2110958

14/01/27 00:46:16 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:46:16 INFOmapred.JobClient:     Bytes Read=12578537

14/01/27 00:46:16 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:46:16 INFOmapred.JobClient:     Map inputrecords=11205

14/01/27 00:46:16 INFOmapred.JobClient:     Physical memory(bytes) snapshot=58740736

14/01/27 00:46:16 INFOmapred.JobClient:     Spilled Records=0

14/01/27 00:46:16 INFOmapred.JobClient:     CPU time spent(ms)=13350

14/01/27 00:46:16 INFOmapred.JobClient:     Total committedheap usage (bytes)=26341376

14/01/27 00:46:16 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=1744363520

14/01/27 00:46:16 INFOmapred.JobClient:     Map outputrecords=11205

14/01/27 00:46:16 INFO mapred.JobClient:     SPLIT_RAW_BYTES=139

14/01/27 00:46:17 INFOtest.TestNaiveBayesDriver: Standard NB Results:

=======================================================

Summary

-------------------------------------------------------

Correctly Classified Instances          :     11117        99.2146%

Incorrectly Classified Instances        :         88          0.7854%

Total Classified Instances              :      11205

 

=======================================================

Confusion Matrix

-------------------------------------------------------

a          b        c          d         e          f          g         h         i          j          k         l          m        n           o         p         q         r          s          t          <--Classifiedas

476       0         0         0         0         0         0         0         0         0         0         0         0         0           0        0         0         0         0         0          | 476        a     = alt.atheism

0         565       0         2         1         2         1         0         0         0         0         1         0         0           0        0         0         0         0         0          | 572        b     = comp.graphics

0         6         530       21        1         1         0         0         0         0         0         0         0         0           1        0         0         0         0         0          | 560        c     =comp.os.ms-windows.misc

0         0         0         587       1         0         1         0         0         0         0         0         0         0           0        0         0         0         0         0          | 589        d     = comp.sys.ibm.pc.hardware

0         0         1         1         562       0         0         0         0         0         0         0         1         0           0        0         0         0         0         0          | 565        e     =comp.sys.mac.hardware

0         1         0         1         0         580       0         0         0         0         0         0         0         0           0        0         0         0         0         0          | 582        f     = comp.windows.x

0         0         0         1         0         0         565       1         0         0         0         0         2         0           0        0         0         0         0         0          | 569        g     = misc.forsale

0         0         0         0         0         0         2         591       0         0         0         0         1         0           0         0         0         0         0         0          | 594        h     = rec.autos

0         0         0         0         0         0         1         1         579       0         0         0         0         0           0        0         0         0         0         0          |  581        i    = rec.motorcycles

0         0         0         0         0         0         0         0         0         616       1         0         0         0           0        0         0         0         0         0          | 617        j     = rec.sport.baseball

0         0         0         0         0         0         0         0         1         0         591       0         0         0           0        0         0         0         0         0          | 592        k     = rec.sport.hockey

0         0         0         0         0         0         0         0         0         0         0         591       0         0           0        0         0         0         0         0          | 591        l     = sci.crypt

0         0         0         5         1         0         2         0         0         0         0         0         596       0           0        0         0         0         0         0          | 604        m     = sci.electronics

1         1         0         0         0         0         0         0         0         0         0         0         1           589      1         0         0         0         0         0          | 593        n     = sci.med

0         0         0         0         0         0         0         0         0         0         0         0         0         0           579      0         0         0         0         0          | 579        o     = sci.space

0         0         0         0         0         0         0         0         0         0         0         0         0         0           0        601       1         0         0         0          | 602        p     =soc.religion.christian

1         0         0         0         0         0         0         0         0         0         0         0         0         0           0        1         584       0         0         0          | 586        q     =talk.politics.mideast

0         0         1         0         0         0         0         0         0         0         0         1         0         0           0        0         0         542       0         0          | 544        r     = talk.politics.guns

6         0         0         0         0         0         0         0         0         0         0         0         0         0           0        2         0         3         348       2          | 361        s     = talk.religion.misc

0         0         0         0         0         0         0         0         0         0         0         1         0         0           0        0         0         2         0         445        | 448        t     = talk.politics.misc

 

=======================================================

Statistics

-------------------------------------------------------

Kappa                                       0.9858

Accuracy                                   99.2146%

Reliability                                 94.437%

Reliability (standard deviation)            0.2168

 

14/01/27 00:46:17 INFO driver.MahoutDriver:Program took 49046 ms (Minutes: 0.8174333333333333)

+ echo 'Testing on holdout set'

Testing on holdout set

+ ./bin/mahout testnb -i/usr/grid/mahout-work-grid/20news-test-vectors -m /usr/grid/mahout-work-grid/model-l /usr/grid/mahout-work-grid/labelindex -ow -o/usr/grid/mahout-work-grid/20news-testing

MAHOUT_LOCAL is not set; addingHADOOP_CONF_DIR to classpath.

Warning: $HADOOP_HOME is deprecated.

 

Running on hadoop, using /usr/grid/hadoop/bin/hadoopand HADOOP_CONF_DIR=/usr/grid/hadoop/conf

MAHOUT-JOB:/usr/grid/mahout-distribution-0.8/mahout-examples-0.8-job.jar

Warning: $HADOOP_HOME is deprecated.

 

14/01/27 00:46:31 WARN driver.MahoutDriver:No testnb.props found on classpath, will use command-line arguments only

14/01/27 00:46:32 INFO common.AbstractJob:Command line arguments: {--endPhase=[2147483647],--input=[/usr/grid/mahout-work-grid/20news-test-vectors],--labelIndex=[/usr/grid/mahout-work-grid/labelindex], --model=[/usr/grid/mahout-work-grid/model],--output=[/usr/grid/mahout-work-grid/20news-testing], --overwrite=null,--startPhase=[0], --tempDir=[temp]}

14/01/27 00:46:32 INFO common.HadoopUtil:Deleting /usr/grid/mahout-work-grid/20news-testing

14/01/27 00:46:37 INFO input.FileInputFormat:Total input paths to process : 1

14/01/27 00:46:37 INFO mapred.JobClient:Running job: job_201401261418_0014

14/01/27 00:46:38 INFOmapred.JobClient:  map 0% reduce 0%

14/01/27 00:47:08 INFOmapred.JobClient:  map 46% reduce 0%

14/01/27 00:47:11 INFOmapred.JobClient:  map 82% reduce 0%

14/01/27 00:47:17 INFOmapred.JobClient:  map 100% reduce 0%

14/01/27 00:47:22 INFO mapred.JobClient:Job complete: job_201401261418_0014

14/01/27 00:47:22 INFO mapred.JobClient:Counters: 20

14/01/27 00:47:22 INFOmapred.JobClient:   Job Counters

14/01/27 00:47:22 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=26885

14/01/27 00:47:22 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/01/27 00:47:22 INFO mapred.JobClient:     Total time spent by all maps waiting afterreserving slots (ms)=0

14/01/27 00:47:22 INFOmapred.JobClient:     Launched maptasks=1

14/01/27 00:47:22 INFOmapred.JobClient:     Data-local maptasks=1

14/01/27 00:47:22 INFOmapred.JobClient:     SLOTS_MILLIS_REDUCES=0

14/01/27 00:47:22 INFOmapred.JobClient:   File Output FormatCounters

14/01/27 00:47:22 INFOmapred.JobClient:     BytesWritten=1439470

14/01/27 00:47:22 INFOmapred.JobClient:   FileSystemCounters

14/01/27 00:47:22 INFO mapred.JobClient:     FILE_BYTES_READ=3657578

14/01/27 00:47:22 INFOmapred.JobClient:    HDFS_BYTES_READ=8630847

14/01/27 00:47:22 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=22398

14/01/27 00:47:22 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=1439470

14/01/27 00:47:22 INFOmapred.JobClient:   File Input FormatCounters

14/01/27 00:47:22 INFOmapred.JobClient:     Bytes Read=8630709

14/01/27 00:47:22 INFOmapred.JobClient:   Map-Reduce Framework

14/01/27 00:47:22 INFOmapred.JobClient:     Map input records=7641

14/01/27 00:47:22 INFOmapred.JobClient:     Physical memory(bytes) snapshot=57962496

14/01/27 00:47:22 INFOmapred.JobClient:     Spilled Records=0

14/01/27 00:47:22 INFOmapred.JobClient:     CPU time spent(ms)=9810

14/01/27 00:47:22 INFO mapred.JobClient:     Total committed heap usage(bytes)=24989696

14/01/27 00:47:22 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=1744236544

14/01/27 00:47:22 INFOmapred.JobClient:     Map outputrecords=7641

14/01/27 00:47:22 INFOmapred.JobClient:     SPLIT_RAW_BYTES=138

14/01/27 00:47:23 INFOtest.TestNaiveBayesDriver: Standard NB Results:

=======================================================

Summary

-------------------------------------------------------

Correctly Classified Instances          :       6921        90.5771%

Incorrectly Classified Instances        :       720          9.4229%

Total Classified Instances              :       7641

 

=======================================================

Confusion Matrix

-------------------------------------------------------

a          b        c          d         e          f          g         h         i          j          k         l          m        n           o         p         q         r          s          t          <--Classifiedas

292       0         0         1         0         0         0         0         0         1         1         1         0         1           1        7         0         0         17        1          | 323        a     = alt.atheism

0         335       7         17        6         18        7         1         0         0         0         2         5         1           2        0         0         0         0         0          | 401        b     = comp.graphics

1         25        240       91        23        28        3         1         0         0         0         0         9         0           1        0         0         0         1         2          | 425        c     =comp.os.ms-windows.misc

1         5         1         352       16        3         7         1         0         0         1         0         6         0           0        0         0         0         0         0          | 393        d     = comp.sys.ibm.pc.hardware

0         2         0         9         372       1         5         0         0         0         0         1         7         1           0        0         0         0         0         0          | 398        e     =comp.sys.mac.hardware

0         25        2         7         2         365       1         0         0         0         0         0         2         0           2        0         0         0         0         0          | 406        f     = comp.windows.x

0         1         1         20        6         0         352       7         1         1         2         0         11         1          1         1         0         1         0         0          | 406        g     = misc.forsale

0         1         0         1         3         1         6         368       7         0         0         0         5         0           1        0         0         2         0         1          | 396        h     = rec.autos

0         1         0         2         0         2         6         6         394       0         1         0         0         1           0         0         0         1         0         1          | 415        i     = rec.motorcycles

0         0         0         0         3         0         1         1         0         365       5         0         1         1           0        0         0         0         0         0          | 377        j     = rec.sport.baseball

0         0         1         0         0         0         0         1         1         2         395       0         1         1           0        0         0         0         0         5          | 407        k     = rec.sport.hockey

0         3         1         0         1         3         1         0         0         0         0         385       0         2           0         0         0         3         0         1          | 400        l     = sci.crypt

0         2         0         13        3         2         3         2         0         0         0         0         350       1           3        0         0         1         0         0          | 380        m     = sci.electronics

1         2         1         2         3         0         1         3         2         1         0         0         3           369       5         0         0         2         0         2          | 397        n     = sci.med

1         3         0         0         2         0         0         2         0         1         0         1         2         4           389      0         2         0         1         0          | 408        o     = sci.space

3         0         0         1         0         0         0         0         0         1         1         0         0         1           0        383       0         2         3         0          | 395        p     =soc.religion.christian

0         1         0         0         0         0         0         0         0         0         0         0         1         0           1         2         347       0         1         1          | 354        q     =talk.politics.mideast

0         0         0         0         0         0         1         0         1         1         0         2         0         0           1        0         0         355       0         5          | 366        r     = talk.politics.guns

27        1         0         0         0         1         0         0         0         0         0         0         0         0           1        9         3         6         216       3          | 267        s     = talk.religion.misc

1         0         0         0         0         0         0         0         0         0         1         1         0         0           5        1         5         14        2         297        | 327        t     = talk.politics.misc

 

=======================================================

Statistics

-------------------------------------------------------

Kappa                                       0.8776

Accuracy                                   90.5771%

Reliability                                86.2922%

Reliability (standard deviation)            0.2174

 

14/01/27 00:47:23 INFO driver.MahoutDriver:Program took 51646 ms (Minutes: 0.8607666666666667)

[grid@h1 mahout-distribution-0.8]$