Hadoop on Mac with IntelliJ IDEA - 8 单表关联NullPointerException

时间:2023-03-09 08:53:17
Hadoop on Mac with IntelliJ IDEA - 8 单表关联NullPointerException

简化陆喜恒. Hadoop实战(第2版)5.4单表关联的代码时遇到空指向异常,经分析是逻辑问题,在此做个记录。

环境:Mac OS X 10.9.5, IntelliJ IDEA 13.1.5, Hadoop 1.2.1

改好的代码如下,在reduce阶段遇到了NullPointerException。

 public class STjoinEx {
private static final String TIMES = "TIMES"; public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
configuration.setInt(TIMES, 1);
String[] remainingArgs = new GenericOptionsParser(configuration, args).getRemainingArgs();
if (remainingArgs.length != 2) {
System.err.println("STjoinEx <input> <output>");
System.exit(2);
} Job job = new Job(configuration, STjoinEx.class.getSimpleName());
job.setJarByClass(STjoinEx.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class); FileInputFormat.setInputPaths(job, new Path(remainingArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(remainingArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } public static class Map extends Mapper<Text, Text, Text, Text> {
final static Text LEFT_TABLE = new Text();
final static Text RIGHT_TABLE = new Text(); @Override
protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {
// left table
LEFT_TABLE.set("1 " + value);
context.write(key, LEFT_TABLE);
// right table
RIGHT_TABLE.set("2 " + key);
context.write(value, RIGHT_TABLE);
}
} public static class Reduce extends Reducer<Text, Text, Text, Text> {
private static final int INDENT = 2;
private static final Text GRAND_PARENT = new Text();
private static final Text GRAND_CHILD = new Text(); @Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
// output header
int times = context.getConfiguration().getInt(TIMES, 1);
if (times == 1) {
context.write(new Text("grandChild"), new Text("grandParent"));
context.getConfiguration().setInt(TIMES, ++times);
} // prepare matrix
int headChar = 0;
String[] grandChild = new String[10];
String[] grandParent = new String[10];
int grandChildNum = 0;
int grandParentNum = 0; for (Text value : values) {
headChar = value.charAt(0);
if (headChar == '1') {
grandParent[grandParentNum] = value.toString().substring(2);
grandParentNum++;
} else {
grandChild[grandChildNum] = value.toString().substring(2);
grandChildNum++;
}
} // multiply
if (grandChildNum != 0 && grandChildNum != 0) {
for (int i = 0; i < grandChildNum; i++) {
GRAND_CHILD.set(grandChild[i]);
for (int j = 0; j < grandParentNum; j++) {
GRAND_PARENT.set(grandParent[j]);
context.write(GRAND_CHILD, GRAND_PARENT);
}
}
}
}
}
}

执行输出为

 14/10/07 11:12:51 INFO mapred.JobClient:  map 0% reduce 0%
14/10/07 11:12:54 INFO mapred.JobClient: map 100% reduce 0%
14/10/07 11:13:01 INFO mapred.JobClient: map 100% reduce 33%
14/10/07 11:13:04 INFO mapred.JobClient: Task Id : attempt_201410021756_0048_r_000000_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:388)
at org.apache.hadoop.io.Text.set(Text.java:178)
at main.ch5.STjoinEx$Reduce.reduce(STjoinEx.java:96)
at main.ch5.STjoinEx$Reduce.reduce(STjoinEx.java:61)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

从输出信息可发现,源码96行if (grandChildNum != 0 && grandChildNum != 0)为出错行。两个判断条件重复了,将其中一个改成grandParentNum即可。

执行结果

 grandChild    grandParent
Jone Alice
Jone Jesse
Tom Alice
Tom Jesse
Tom Mary
Tom Ben
Jone Mary
Jone Ben
Philip Alice
Philip Jesse
Mark Alice
Mark Jesse