hadoop的自定义数据类型和与关系型数据库交互

时间:2023-03-09 06:31:50
hadoop的自定义数据类型和与关系型数据库交互

最近有一个需求就是在建模的时候,有少部分数据是postgres的,只能读取postgres里面的数据到hadoop里面进行建模测试,而不能导出数据到hdfs上去。

读取postgres里面的数据库有两种方法,一种就是用hadoop的DBInputFormat(DBInputFormat在hadoop2.4.1的jar里面有两个包,import

org.apache.hadoop.mapreduce.lib.db包和org.apache.hadoop.mapred包,前者是较新的),另外一种就是postgres的CopyManager类。

先说一说用DBInputFormat这个方法吧。

首先在数据库里面创建一个表,插入几条数据测试用

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAawAAAB2CAIAAAAbeDAGAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA7EAAAOxAGVKw4bAAATF0lEQVR4nO2dPW/iShfHjx9tv3yEK9beKLqiiUQRmlshrdkUbJGUhApLKVZOkVsgpYqUYlNgbRHJqdhI2yTFUnDxSjQ3jVMgpYkiROzlM3DrLfwUMzZ+AwxxwPacXwXOMJ74zBzP6/9w379/BwRBEFZ5AwC/f//edDEQBEHWTbVaBeIEAeDw8HCjhfHz7du3pBUJeTlo1iTDmnW+fftGPrxxLv33338bKsxMElgk5OWgWZMMg9b536YLgCAIsknQCSIIwjToBBEEYZoMOcG+nMvlcnJ/+iG5jC/LuVyufDnedEGST5rMiiwPMetGm0KGnOAyrK1BEW+HTXc9oJ9EVuDN4iTIC8gf9SdHmy4EgiCzSUlPkLziCeXLMe1fTS8sldn4srzfBgBo7+dCspl23mhPvVwO3MhdgPkdD3fnZPqrneZg2UeQRdJr1szjmYWQ+xD2cGgaWQ5a0F/hXc85mLP3nusfGqfBCfbl3H4biucPk8lkclsnF+u3E/vroCkt9djyR32SC82DZjr48XMMAOOfPwYAUP9YttN/UieTycN5EQbNHbkPML4s7zQH9Vt6tb0fyWrkV/SeD+fFJQqcTbJh1mzz9IW+WOY8nDZ8tA22I4FK6nb7C/37LIM6OTuML7+0AYrnD/2j/Ov/Z25S4AT7/7QBoP43eTRlpX+Uzx/1FVKZyx/rADB4/vWiW5Q/nxdpcyGNpXj+2Wksfwp5AMh/+FQEgCdj3P/aHNitKS/8GfH2wXwZJxtmzTqf1MlkMlFgzsOpfywDvHtfBIDipw8e/zXPoDRn2x7PX6XmAIrn6ro9IKR3TpB0I2Ij/+FTsTkY/PjZhx+DoDEBqO0Hjhnb+7np/Z+MMZTnG+/X8wDslofMInVmzTje+up/OB/n/HLw/AsgP9ugvpYwaLdDrq6JFPQEg7gHUvY46oXkj/6uAwya+80ZjQXGxhMAFN+/I1+dMddkMplE6MCTV+WTwfwIazZpNCtTLPtwljBo8fy8DgDt/U1MxabACZKetD3JML6ULw0AoG8NUoeXJsQlkdsATIdoHshoqfjpQ54MspxJDxhfliNYjoy76AQV7RcyTTbMyggveTgRDfpBua2D+x7rIwVOEMrK5LYOg+YOWVd9gg/EJPu5XE76sVKW+SOVzIi7lrLs5uKaOwcAe7GRvNL6R3mA/FH/4bxo/zjqSq/nll+ecGEkG2ZlhJUeTnk5g9IpXLJMtU6479+///79+/DwMFHqEZ1OZ+1FotMX9VtntpZccV1AXgaaNclswjqbpNPpED3BNPQEV8O7R2nx/jOyWrna8u2y90JWZp1mRdggravDi1n2rEZZmUyUhZfiuReyMus0K8IG2e0JIgiCRACd4IrQIz4vnMNNgIRGgkBtnSVZvRKuUPGyax10guEsUH/py77Jdnp1zuyU+5ysk3FZIee2ljsghryARCnNvEhkyFMJvdXLnWmqKt5GrINOMJz8Ud9zqseDfcrRM9nunN8PxamwdN+osyuUbuel+wcRtphbzeYTWgnd25lJpljxFpMGJ0j74bLsk/3w9M9dXxaKWwDMlsSwxS3cryS/2MnP4BmsvrzTHBTr9Rnb/+gaJTmXQLb0tv+hZSl/rLNaGX0Pll59/hpqaF9Cn71mKdB4b9H3K83M10cJKp14u27OoHJxCb3lcrJx0syq5BDs5JUvx85Z9LBDMJ4n/IKKl23ruEiDEyQMnt6rjhzF18X95TniFnMkMYLiFkGxk1/Pvuo3vizvt4vnD/3P78PLQvfL05OR5AT69GDDu/dFFk/rh6rIwAxDRxEjCaYJ3OKdX2lGKS9VGVyCC+AcuSM1YUEJ6/U6zJG0mfm/0wFGQHwoUAkBwKUiZjuolSseI9YBgDQ5QfrfRD6DO1vcYq5eiE/cIkTsBIwncB/17ss7zUH9dvWDpqQMrB0rDlGRIX8IGjqKGElYmpm3cBdjmcoAnuOPrs5YhBIq8yRtbAL/+6ykY18lhLLijIPpm3/xEsaciseMdQDS5ATjYfr/05cmeZk4FWFpGQsyNUNzI6eJBs2dpEy8ZwVn2OR/97vsNTNNBJaoDNN25umMLS6h88N+tJEsAKwmPkSOn61xbJFi6wAAe05wysp6Id73J53aJpDhSvH8wTfV7R2HeMcoIW91xEMUMZIXKtAsUxloO3v++k/bbmXR7r5Y0ibIrIFPxNHDGipeBqyTZifo6R8vocuylCRGUOxkTKZSFq9kuPZiebrhpKjTCSHyzrLFnFgh5MHOTR9FjMSXJvQWPreygj4KaWftdtvnyhaXcIGkTdi9SB0Pig/5KmFfng4+qDLO+efy6hWPKeuk2Qm6VSqW0mVZShIjIHbiU8WaCXngtLaXFbI/we6S12+dvmJ/+s5iiuCDnZUwghhJeJqwW/iVZlbQR6Hz71OrRZZLmSVpM+cXdEOfX3woWAmdRRGXMs7qFY8l66CKzEpEECIZX5Z3mn8ulCohy1+biKywAZJu1lcnKGmzDL668jI1nGDFY806DKjIvCrO+3Vm53z888dgsXZJX97ZWGQFZO2sImnj2krnc1qLK+G8XLHiOWRXReaVWSRFEk3tBAVNmGIVc8/7zerVByueC+wJIgjCNOgEEQRhGrowsuliIAiCrBuyMELnBA8PDzdaGD/fvn0jhy2RLPHvv//+9ddfmy4FEs7d3R1T1rm7u3v79i24F0YSuDRuWdami4DEBsdxmy4CgoSAc4IIgjANOkEEQZgGnSCSCExFUsxVfmeu8itkObJtHXSCSCLg5ROolaYtLdh8NImTNOejk7Rbcy47mJpmun7mUAq0ZFMphf8BcZNt68ToBF8vGhV5UoGHiaQbWsVtanB9DTXnWy3YfBpVkXwQthsHe3zgstOkhErlQqPZd6q9xm7LsCzLsq73/GXgZd2yLOsaLtANemHIOvE4wb4cUeFhFZRSBXpWa/dVMkc2B79VaPQsB13maaW3LMvSdd1SxcWZ+Gj0LMsyWruNqsjvHTRahisTTRKEWnhrMoZXxxf4knXDkHXicYJlxRsAIVbklZ43kiE0RTHN0SMAaJLTAdGkBcMkXq4Op81KkzpVy9JlPpjQVM4eWy3ooBdcidRbB+cEkaRjKmc3AACFbQHEKpzRliOeHNws6B+IqtOsNKiqoquRutAujgunsnyy3cEh8fJkwDroBJFNQXoPi5MZW9e6zBtD2OI9LYeX9dlDhM50xr1ydX8sdDodjutUqyN/UzKVM+ipor9rwjwMWQedILIpjCFsC4uT8aLIA5ijSImvKhzHCcf3UFWd6axeY7dlqKpqWaooyt4RlybV4NpurKJ6OhRw/Y3AkHXQCSIbwhyBaxFxAdrFzXaUxPZcfqRZZFMpdaqeeShR7UEFN8wAW9ZJgRMka+nH91cV3CiTIcwuVMPmwUPTKp3ta1diYxg+VBPV6EtomsS5uhmuLCzj4EZgfe8gU9aJR1mahk4AIEF3mysHPghD1i05pqyQpKBJNTjRo6Q0lZJwc2Do7hYpqnqk5qRJXOUKABo9NXAdetaMTHjmaxxj1kluoKV6vY4qMlmC4ziU0koyDEppkf83BcNhBEGQ1wOdIIIgTJPoaHMow4kgyGuT3Bgjf/zxx6aLgCBIliFzgsmNMcLUHC0jsDb1ni5Ys87d3R35kOgYIwiCIK8NLowgCMI06AQRBGEadIJIyklHHAtWSYN10Akim0KT7COgPil3ivuYuKmU7O92NAtNocdHuzX/QdLERbFIIQxZJxYnSKKLOMj9ODJ14TwprKtZQlSt3gGAaSoXNweG1ev1LMuyDMOwLKPXM/xaI9NgFbtEtml4TAXmCrbeSWKjWKQQhqwTV0+wfjuZTCaTyW0doL0fa7QlEmPEsiyrVziuKfFljGwMuzMgynvdriGrumxIZ2dEO7hb4zihctYNqfamorhFhHaDYk8JjWKRLlizTixOMH/UtzVjyh/rADD48TM+LziNMSJWG7HlimwSXhjVOK6kmKaxtSeCJnGdqq6rsiFxF1u6ZVnG6RZNSroMwvH9VYWrAYyUUeSbJCeKRbpgzTpxzwmOjaeYc5xiKmdwyrbGUWbgZd2ydBkMEHjQoOp0CehgihdGNTLLRIZERmu30bN0WZblLShsRVS6S0wUi5TBmHVidoL9r80BABQ/fcjHmzEVLsOwc5lC644EHkCEjjvghB3ntgovePknK4pFKmHGOnEKKIwvy0Ratf73Ubw+UJO4ymPL0rEbmCVM5Wy4pQMAiKplVaXS6OR0WBudnEJXIMEmAl0KUynVbgoH1zPzvKpwVwAAjZ5uqVSrU5NKoxNV5sl372tUk2pwrU+jWHQ4ScMIrwBMWSe2nuD4skyir8cpKg0AAEqJCM2iB8wQplLihOOCvaRIpp3o4Ejcg5prB4YmcRwn3BxsdziuBqcHhXnhLBIaxSJlMGadeJzg63lAMJWbexqmKrA9CUktvKz36CqXqZS4TtXdMHhZN7bPqK1N4cSyLEuXZdWyTodncBKYPX8cGuRDYqNYpAzGrBPHcHh8KTUH5GN7P9cGAIDi+UM/ljExBnzIKsL2LtARj6X6W87U7Dzv7DKrXDV606QRQlkkJopF+mDJOsmNMZI0dS/k5bAm1pQuWLMOxhhBEAQBQCeIIAjjoBNEEIRpMMYIgiCMgjFGkHXD2tR7umDNOhhjBEEQBADnBBEEYRx0ggiCMA06QSQB+PQ4F6Qt4dnJtZJ166ATRDaEqZRcJ0BvzshHV0CL2cdDrzqeABeBJpq4KBbpgyXrxOEEfSFG4o8x4jwqrKsZgpd1S5d5TdMAXJEooNAyLMsyWg1yJRDmRziG1nZn+r0G0KEVI7FRLNIHS9aJqSdYPH9wxxiJ1Q/S89QkxggGgsgGTjAyUxh1OE44vjom2iCGPyUv6yS8j02vAbCnTr+ebslTieKERrFIGYxZJw4nmD/qO4ox794XY8jRg6hGFCBD0oOtSmcaxp5qWUar0TIsy9JlAR6PBY7jhONHV/Jubdq36FSvwfW1Ujmb0U1IUBSLlMGYdeKeE/z1PACA+sd4RQVpr7tClLaRDMDLuqWKWnckGDS0RFciwyZ7wFVwJy/QIVOvsbst8PI1bZaugVkYiYlikTbYsk58TrAv53K53H77FYRV7cmBHlRSt/KEzEEbwR4vbD9WalCFmytyMayvAUBfhDTMhT1Am10dkhXFIpUwY534nGBZoZOC7f1cLtbAww6i2nrEMUtmMJWz4RYPAIXWtSzKp43CFm92bwokyre3rwHOLLkzOLOMVoMEP3NxVeE4Tji+h6prXqqx2zJUVbUsVRRl74hLk1wCxqJ6OhTwNUtgyDqxb5Ehk4KD519xZahJzqKwJh3DNo6Hs4AnigXpXVSuAMzu8OAk3MJdu/Ng9zO4GlSh2/WmSmgUi5TBmHXicIJ9eboaPP75I+ZJQVGlUQU4rvLYumZe9zwbuKJYGEOgM0jVTi0kRgXFWXJ0+hq6LMqyZ1tFYqNYpAzGrBNTT7C9T/cI7jQHsU8K2j3tGatHSDoRq6Sdiaou85rEcTSkGVkEE45hfhBvutFMuJkT3ozsTKtc+eOBaxLHdaqzqhOtb2xXNpasgzFGkPXBmlhTumDNOhhjBEEQBACdIIIgjINOEEEQpsEYIwiCMArGGEHWDWtT7+mCNetgjBEEQRAAnBNEEIRx0AkiCMI06ASRRGAq0koHoUwzjuNTbsV3JEi2rYNOEEkEvHwCNdeB0GDz0SRHm8klqgHdWlCyKXoUC5vRGcrHzCHb1onXCRJNwVcR0jKV0jyBMiR9+OJT1ODaJUpcqwWbT6NKD9QL226tTufyclEsTKXkvoOTi+86qzBknRid4PiyvN+OLzsv2sXNQWv3tXJHNgG/VbC1leiR+KlUhqXr+ipBFZaLYjFtoL5cZlxnCYasE5sT7Ms7zUGxXo89xAgA0Zc9RREtZtEUxTRHjwDgkmDXpAVySpGjWCAvIvXWiccJji/L++3i+UP/8/tY8vNiKrWbAwwuwiymcnYDAFDYFkCsgh25Rzw5uFkQg2xRFIuriiMFeu/+/Br/RVbJgHXiEVXdaQ7qt07EuZjRLm4OUEs1g5Dew+Jkxta1LvPGELZ4T8vhZX32q3FhFAvT2Lp2BntGa9c19jNO/LrwDMKQdV7uBMeXX9pgy6ruNAcAMGjuxBh5ePR4TwS7j++vKiUlrmyRTWMMIRCFIgReFHkAcxQp8RJRLEZnMwP3dGsMy0pTGLLOy51g/qg/cXg4LwIJxR6ftrQzHdvabfR07BJmBXMEs8Mx+tEu5mkUT1kmikVh9u3n/IkRWLLOm8VJEoEmcZUrAChtW+gHM4HZhWrUeXBT6Wxfq9PExvAR9kLSiaoaT+GYhynrpGWztKjaK/WbLgkSB5pUg71Ia12mUuL8EX5cU08L7hIexQIATAO2h9N9b3Bd7TjTVMGouozBmHUwxgiyPlgTa0oXrFkHY4wgCIIAoBNEEIRx0AkiCMI0GGMEQRBGmcYYqVarmy5MCG/fvt10ERAEyT7/B6T+Vq2HlibVAAAAAElFTkSuQmCC" alt="" />

由于表里面的数据要用来做为map的输入Value,所以要自定义数据类型。

hadoop要自定义数据类型要实现Writable接口,如果是Key要自定义数据类型那么就要实现WritableComparable接口,还要实现里面的比较方法。实现WritableComparable接 口在比较时要反序列话,比较麻烦,那么可以用继承WritableComparator类来实现字节流的比较。

在配置DBInputFormat的输入参数时,必须要有一个数据类型实现DBWritable,所有在这里为Value自定义数据类型要实现DBWritable和Writable两个接口。

package com.qldhlbs.hadoop.demo0420;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException; import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.lib.db.DBWritable; public class PgDbWritable implements DBWritable, Writable{ private Integer call_type_id;
private String call_type;
private String remark; public PgDbWritable() { } public PgDbWritable(Integer call_type_id, String call_type, String remark){ set(call_type_id, call_type, remark);
} public void set(Integer call_type_id, String call_type, String remark) { this.call_type_id = call_type_id;
this.call_type = call_type;
this.remark = remark;
}

  //结果集读取
@Override
public void readFields(ResultSet set) throws SQLException { this.call_type_id = set.getInt(1);
this.call_type = set.getString(2);
this.remark = set.getString(3);
}
  
  
  //设置参数
@Override
public void write(PreparedStatement ps) throws SQLException { ps.setInt(1, this.call_type_id);
ps.setString(2, this.call_type);
ps.setString(3, this.remark);
}

  //反序列化
@Override
public void readFields(DataInput in) throws IOException { this.call_type_id = in.readInt();
this.call_type = in.readUTF();
this.remark = in.readUTF();
}

  //序列化
@Override
public void write(DataOutput out) throws IOException { out.writeInt(this.call_type_id);
out.writeUTF(this.call_type);
out.writeUTF(this.remark); } public Integer getCall_type_id() {
return call_type_id;
} public String getCall_type() {
return call_type;
} public String getRemark() {
return remark;
} @Override
public String toString() {
return call_type_id + "\t" + call_type + "\t" + remark;
} @Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((call_type == null) ? 0 : call_type.hashCode());
result = prime * result + ((call_type_id == null) ? 0 : call_type_id.hashCode());
result = prime * result + ((remark == null) ? 0 : remark.hashCode());
return result;
} @Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
PgDbWritable other = (PgDbWritable) obj;
if (call_type == null) {
if (other.call_type != null)
return false;
} else if (!call_type.equals(other.call_type))
return false;
if (call_type_id == null) {
if (other.call_type_id != null)
return false;
} else if (!call_type_id.equals(other.call_type_id))
return false;
if (remark == null) {
if (other.remark != null)
return false;
} else if (!remark.equals(other.remark))
return false;
return true;
} }

首先在PgDbWritable 里面维护对应数据库表的3个字段,并覆写关键的四个方法。每个方法的作用在代码里面有介绍。重写toString,hashCode和equals方法。

自定义数据类型后就是读取数据库的数据了。

package com.qldhlbs.hadoop.demo0420;

import java.io.IOException;
import java.sql.SQLException; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.filecache.DistributedCache;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MapreducePackageDbApp { static class DbReadMapper extends Mapper<LongWritable, PgDbWritable, LongWritable, PgDbWritable>{ @Override
protected void map(LongWritable key, PgDbWritable value,
Mapper<LongWritable, PgDbWritable, LongWritable, PgDbWritable>.Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
} static class DbReadReduce extends Reducer<LongWritable, PgDbWritable, LongWritable, PgDbWritable>{ @Override
protected void reduce(LongWritable key, Iterable<PgDbWritable> values,
Reducer<LongWritable, PgDbWritable, LongWritable, PgDbWritable>.Context context) throws IOException, InterruptedException {
for (PgDbWritable value : values) {
context.write(key, value);
}
}
} @SuppressWarnings("deprecation")
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException, SQLException { Configuration conf = new Configuration(); DBConfiguration.configureDB(conf, "org.postgresql.Driver", "jdbc:postgresql://192.168.0.203/test", "hb", "xxx");
Job job = Job.getInstance(conf); job.setJarByClass(MapreducePackageDbApp.class);
job.setJobName(MapreducePackageDbApp.class.getSimpleName()); DistributedCache.addFileToClassPath(new Path("hdfs://192.168.0.201:49000/user/qldhlbs/lib/postgresql-9.3-1101.jdbc3.jar"), conf); String[] fields = {"call_type_id", "call_type", "remark"}; DBInputFormat<PgDbWritable> in = new DBInputFormat<PgDbWritable>();
in.setConf(conf); //配置DBInputFormat的信息,job, 输入DBWritable, 表名, 查询条件, order by条件, 表的字段数组
DBInputFormat.setInput(job, PgDbWritable.class, "dim_160_168_call_type", null, null, fields); job.setMapperClass(DbReadMapper.class);
//可以不设置reducer,hadoop会自动配置最简的reducer,看源码可以知道是输出map的输出
job.setReducerClass(DbReadReduce.class); job.setOutputKeyClass(LongWritable.class);
//job.setOutputValueClass(Text.class);
job.setOutputValueClass(PgDbWritable.class); job.setInputFormatClass(DBInputFormat.class); FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.0.201:49000/user/qldhlbs/db5")); boolean isSuccess = job.waitForCompletion(true);
System.exit(isSuccess ? 0 : 1);
} }

这里只是一个demo,所以map函数就直接输出读取到的内容就行了,由于reduce函数不写,就是直接写出读取到的map函数数据,所有这里reduce函数也可以不写。

在这里有几点是要注意的,首先这里面的包都是导入的mapreduce的而不是mapred包,混淆会报错;第二点是在hadoop的hdfs上上传一份postgres的驱动包,

先在hdfs上创建一个目录:hadoop fs -mkdir /user/qldhlbs/lib,然后把文件上传上去:hadoop fs -copyFromLocal postgresql-9.3-1101.jdbc3.jar /user/qldhlbs/lib。

在代码里面就是用DistributedCache.addFileToClassPath(new Path("hdfs://192.168.0.201:49000/user/qldhlbs/lib/postgresql-9.3-1101.jdbc3.jar"), conf)这个方法

把jar加载到类路径上去;第三点就是配置DBConfiguration信息,参数依次是Configuration ,数据库驱动,数据库url,用户名,密码。在配置完DBConfiguration信息后,

DBInputFormat<PgDbWritable> in = new DBInputFormat<PgDbWritable>();

in.setConf(conf);

setConf()这个方法不能忘记,一开始就是没调用这个方法把conf给DBInputFormat,一直报空指针异常,后来经过调试查看得知是connection没得到,但是DBConfiguration得到了connection。再进一步调试是DBInputFormat没得到DBConfiguration对象,所以根本就获取不到connection。查看hadoop-mapreduce-client-core-2.4.1源码才解决问题。

public void setConf(Configuration conf)
{
this.dbConf = new DBConfiguration(conf);
try
{
getConnection(); DatabaseMetaData dbMeta = this.connection.getMetaData();
this.dbProductName = dbMeta.getDatabaseProductName().toUpperCase();
}
catch (Exception ex) {
throw new RuntimeException(ex);
} this.tableName = this.dbConf.getInputTableName();
this.fieldNames = this.dbConf.getInputFieldNames();
this.conditions = this.dbConf.getInputConditions();
} public Connection getConnection() {
try {
if (null == this.connection)
{
this.connection = this.dbConf.getConnection();
this.connection.setAutoCommit(false);
this.connection.setTransactionIsolation(8);
}
}
catch (Exception e) {
throw new RuntimeException(e);
}
return this.connection;
}

这是反编译的部分源码,可以看到connection是可以从DBConfiguration对象拿的;第四点就是配置DBInputFormat的信息,参数是job, 输入DBWritable, 表名, 查询条件, order by条件, 表的字段字符串数组。

所有的做完了接下来就可以跑hadoop了。

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAkIAAABRCAIAAACxGF5LAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAXE0lEQVR4nO2dPXajPBfHL+95lgKTkznJAkxWgJPClUtPh8vgwt2kynQujEvTTUpXLmbMCoYsIDnJyaC98BZ8SSAJZOMYee6vmQkG6epK6Ft/IKkQ+zbk2H5c/Rmps3MB3N3xwu8qSw6zM/bto6aSE92Zl8Od+5kObR9vO9eX5UFUrkThHPq+8O1vtuew8JEe8z/gkGfin3uT9/MRIKsbI2caflKkumDe/0nzY+fyfg6nxjQEIKubmxX5bNuOBFl986Co+T6vHH4iZPUDdmunh/Hm5Y1uhvahq3BYDvJbODVK+K/LqfLlMMqEsdXnsa/3BG4zdjzCKccP4dSwNuOszor912F2RzhlilpWXXdrzDHD/xzI31d35ADEbzC+PZfqPn6L4OrL8VLDLYefGk642IznJ6gsTxVvVxxsfznQ4vaOjuOfrsobH7K6Gb6mnb7Yfx0Wddqxr/eI6vCMM3eUDtKLPhU9P0B3tOinONfrQ4n0h9i30xDLZ5gniuiosT4TFnXR9n03vZQGVppKGVTYKQ5fBNOvpB8Ed0cZVYmBd1XgT66ddEbUJnly97WwXtVOxhzXpwuGQr7n8RZxyKcIeV13brT7TjWKymEthvxy7NtUZLFvpz/Jw2kg9u3ybmYSLBa5ubmcyMoVL17GGa5bK1yxb/Mu1cqDuFzxwxHb2Zi/Vfsl9nD90GI+tZovvHKr8l6rl5MW+ci5v/7Xsa/3iLbNWO7NotFJKsWzeL3F1xOeD/IA6dasbMZsu3yaKoJxTDdPdHlxd2W1U9T6tA3sA4Lw+YgyMCuodBubR8azU+hPkZ10RNXqoIK0yKvamexcYI2W56/o+s6loqXDlJtac3WHLxA/KKH/ix/4NfIe0QsDod4/YeDi8swvV4J4q8E0NmOC8iAuV/xwmO4jFWazM1n75fbUw0/oV4YbUT1f+OVW8b1WLCfN+chQqbELi459vU+0H43VHhW90rJXvR5U1ngwYzK6XSnfNX4zw3QaWLfTF9i2onigOfxKgnk5WEkUfy8EdZNsSZxnJx1GvaZpOxY71M7iduV8r4ej2MGknuzoDRKGL/J/GrddNX2vZow3JOI1Y6L0Ssqz1JhKvJWM4BYuSZstjLdWrprDoXqbkvyV+k1oD7e8UY2eSvjyl1/if+VmTKFQVW/P/z729T5xwNqYaP3ikHUNslq8XTGzSub9T98OhpWJ5XJLyDBoZWjkWdkDlhfB699idpcfPhfz/k+S/IRvCkud3dlZI5ymN0WeZRjDAIKhYM26DTw7yd9XsL9aXDO7zvdWKPufWdBvfkTmf/P+p29HEfg/D95u0nr1RZRetXIijFc1v0TloTMa8rdq/yH2OOudC8GW3cWgsCqm9F4LUSyfgvutr0yFWbjl2Nf7xAHNmPXV5r4/out8nJEbbEMw779feZbxDebzrxAM6SzNG5ptfoGsbizvihrftTCU7Ucxi7u18KVk2692btC41NmxnRWcdTEJkncj99zRp2jngfl+YGun4H9w1myfrWH/mcz/ZPXNu3LdyLMOXKknqx/wXZJP8VvE/M1Jr1o5Ecar9p4ejUp5EOVvk9/ahi+iMfwyHNX3Wohi+RTcb365olvkws5jX+8TBzRj5u3Yjja/i90sPwJwR474OgD31XHm6eYXZ52+j1k5prM0bWiCvN8Tv0VFf4D8fW1nqPdNXOlVwhdAwpANoSkzO7ezSrgNrr6YQH5v4JDukcDO1JxFvmvU8iLqumK+F+9B5bIKyv6XwK3Cxf4Pp5Z3tVuv13Fl5K7cFPC6/GUgZHVDjYZF6VUuJ/x4zS9XkOcXE7EAUXnYC055kOYvz36ZPQ3lrXZZNBTjhaP6XsORuwzO3LeDHyuS22n7aUqOfb1PVGcZ26+N5XfnVOfdedeZ3gt3HyG7J6ayOOHWukC2v/NtyHdK8dfGqjHkMYvDb0xrbd8Sf22Aa6d0clkYSVJJUEIvjLVbMFK1k/4hXURkVnBa5/vOTTfCcdPUytRm1yjDLYfcSNiSEdPuEYfDp7r6Ug3E9uPSzdL0CsqzoFyJ4qV3wzWvjSWC8rDH2li+pZhOmCS9IvuF9vDCZ6NoEb643Kq/1yrlZJ/FJ9HelWNf7wltmjGkJ9RrGi3o45rwSWjb3zhJvD0uXMf2myB8LLfawJ1UDIayY+7IJ1OsJx+ynIycmnA63G91R9N4u+LY9uvuHwTASJLk1DYg5004NYawa1zARk5KODV+fI3PUvdrT7DcakOXYlRdqTlpqgqFCHDW6Y6divgXI6R5ioF/1Z7PhaxuTlLMRfE6675qV4qUnI5dfvJy25bj2PP55eRY78VhilxSP9SasXAqOMBAHVrgppGV1JRJSZLVjSQ9VDj0OYlqnEI7BYjuVw0HxPbL01WPlZ+0Jj+fD8eRjkX+FfpWfvpmzz9E2YxlfYktcA5CkNWNMYRitZPXa6N3rEqlJMnqm2yrLrPzdVTuD4P8xI7MTh6i+1XDabK/IV01XI47W/gZqFurTSYjck9WN5WGkD09Wf7UoUSytMPlrA8Tq9/DzqPaA60U0w8LXOdpiYr9fZ5k0d3Vqhz7vfhsymYsP681qt8ULrzIlX68gDk8GC68yM3+Mu+/u8XhDgAgq2+b8U7YXWEPITpOHqV5O84PXkjs5CK6vymctJmrtxR8+8Xp4ofDpdnPJeaXq9pJlPgtKs7ahAvvynXLc1wZ5ZEF9uSRyjkk5h0omtNwmm1CyTcIFfd0+RUe0bmumraBsj1kdVMfF9Mi39VHyOrGGAbUVraft0AHxktyeX36m0kBJ3wZgvB5flCPN83f4pdK66xQngVk5YczYSO0n7a0VW+hvT/F5QS6K7qd5ZcoBe39efT3oo25bfNd5IeqqW3WxsjfV3BHkEfFKUbMECq9vaiL6ZN/ZPVtM/55LzyqK9aD6cfZcZH9TenikZUi+p1p9DNNoRFTDFLI39dCJSbcBu5oPWK6ENWno7c4/9O2bdGdbXHWSWWPct4e55MthwgeyO205tQJxOykpqo91Q4XhNsg64zRsg3F9ELa56AXT0wzLZ9kdcN+eKisQaziK2qjt3Lszg1fjDB8rh/2izcYfoOfR5wgC4bGdpT3qIoT5QL7KXtq/S8eSv4UlxOxn9XoLL8ktPfnsd+LruwU+4HjzzbNWPwWQTDMYqem93IqOi4VOR3zy1VWYWaVvbAtEurBZJOUn7tlKM1UOk6R/fJ01cOBUlZm5wZloW7ycyXcL1fRWwwQbgNIHVwOxsjqx6s/d9Lz91tOIOHCi+jD+FffWytLnhSBnXkDQpc2dRhv5S4EAPJ7UzrLvB3b0VsM4TYQSMuR35vILj79Zt5/z9X7wm0A+SwFOPOigeCGL0YYvsgP+8Tr7tJpJfP+T2V+iVuelaE6AJRmnzgfc3vAmft2bY6BRdGf0mD4fu4qHLX8kqHqTxWU3osO7RT4gefPtjsVS3Pvv7vsbFZ1CCWQkmxqxERDsaztPflk7X6NWAM1iVKJn6tYX+3XvyTcBq7v28E2LAdj5Pcm+4SmeTuu9moNwzCMIewq098KEsknpQOpaHHYt+P8fS1dqCzFK5o4EGqqdhQ+AN8PXcX7GXSSjx2lq6sJoM7yaw9O816oSh6r+YHnzzbNWKVdqhpQHULxpSTJ702Up9nyopogO38oRlY31tv3Piw4iuxvSpcCUj/XMb9cRW+//77aX29vx/br39/5YKxqEt15zb+owrRtWXhKEsk5VSnb43OwVLQs6O9u8GNFIFx4V0Vp5ErxOiMX+KMCZf08RalfYfjHlaI+Pu3sb25dOkpXVzqIneWXKid5LwCUJY87kCZv04yls6PfKG3IYkjHHUJxpSTziddsAhRcdjjAHYqlc6Mn0qGsLGWL7G9IV8OSOCNRKvMzD+urHXgejG9N88tV5HlB2ncJF0WZyApGZTnJvP/Dn7JsJ5EskrJlfzqMtD8n6sW1l4pWtscZudHm92r7ShU7vhSvM/ftyLPqfRapdDJPylZR6lcowawo8byXxLDKFg9nRM80hNugYYwhk9xVkZbeI128ciKVulags/xS5TTvRWd2qkiTA1MHV+B/XpxVAZV8J652e6W6Zy4IpDnr5sjtrCO6vymcqgaszH7ZdV44IolSkZ8FVL94mw+1BNLOrEbczs1jV5RITsqcYaRs2Z842dLuc6FUKPV0qEhF72kP/+ONAt1atohWrOPko0hqWSj1y9AYvrLEs6LEcPlES61BsVS0mpS2QOq3DKB6VVlHml9OxBLnDYmWKDkflF8i0xUlv4/9XnRlZ1tp8oPEqMLpzd95BxMRXYWDIAiC/Gv8t/+j4XQI35PD256uwkEQBEH+PVAaGEEQBNGYLqWBkc4Jff+sZRX7DvofQfpPtRmTSfpqwR5Svz3GuYNJc3KIbxt2Xt8S3w+ZX1TFiiW60Gfj17ag/xGk/zDNGPHt4csylfRdvgxtrXqixLcNYw+p336RpaLAmj1T6mfiis598IrVxQ1zOJjaJMwEzqiwGcbjZb7JiHfKIzsJMtqee02K/kcQ7aCbsXAxe85fR9N7cJ9nh6nsfS6mFyWJgmRwPzEvrgfLpt3Bc0vev5Acb0m3rMbLQXpPfcd6ODUMYf/FGbnBo1adG1XQ/wiiHVQzRj5eGEnfywG8fOAr00NKCTIgYUggfn/O/sdcbsRZV7r3xLe3oySJPMG+0XAbuNfvOnVujgL6H0H6BNWMxe/P9C/mxTU8v+8nJol8EuTX9gMAYHBpmdbHpKgTmT8kOCO37KoQfwFPEuEY4j++LOfrEeCAIAf9jyCnh2rGrMsB/Qv5eIHBZTcKlchRCKfWDC5M8vFyfWGCeTcuM9D0HuRrhC+PdqrE+Tyzsn+sWTCzDEM0qRUuZtcPngnOfLzBAQEA+h9B+gHVjJkX14yk7/szXF/gqeTPJasS2+Gsk2TtQPwOlxYAmF5E9eWdtVSR8/ohYnRi6BWhp4u4/km56RCyL3qa3gNotv+nNeh/BNEOeouHM18O8hVk4j8Gg+WJVHn/YfIqUQGlmjfddGfNnmU3mY7Dhkd8+/GS+t6bs95dz6RfQ9MV9D+CaAez4d70ot11Oq9hza53wpXmXpLvZqa+z61fjzXcvozv1LweLmbXrYW3x4UyZ/vvHhLfnsBTpTA462QHw7M7yYT+RxD9qGoqOuskWZ/EkkMxvSjxTm3EgYQfl09qfYdw+ngZRy2rRNNT7ZgQ37Y245jboXHWyQ6MoREADJb8W3QD/Y8gOtJwRAb5NOLloPW3ILL7G78e0Rgm9SUE3k/NZ6jOB/Q/gugJSgMjCIIgGoPSwAiCIIjG1JoxnaV1aY06/bZ3AFREZrny6ozybPUR5vHy5zbZidK0AOh/BNGRshk7B2ndUfGR6yVouSHZ9KJ4vJn4BADC7SzdNcpq1b6/b6sJG1C767L/h9Oy0isUxlCatgn0P4LoR9mMnYG0ruPkNYB5N9ZVEdL0osiLfT+05uw2gng5GCzjZL2Wn6st4ArUojRtE+h/BNGMc10b01CDpJyRIv7HheeYZjidwFOSrC1/6hMwvegJJvSHrezsKO1zNmqwF+9KMaI0LQ36H0H05Cybsey7ae1PmPaCYkKL/IILh/i2sR3lFdrLbOKTdMRc1HHZ8DkbJCRJkkTzS8U4UZq2BP2PIHpyfs2Y7MRo30kntBbvF2ldln+u0Zo9wzO1TiPbvtJ2CIrStBzQ/wiiIWfWjBHftt4fxLMy/Yf4jzByir5+Ei8HA9cdDJbLZXEUtovkoTQtH/Q/gujGWTVj4dSagbZ6xqHvE+JPGIU+4k/eH9K5qrs7mIi3qRHfNozpolHYFqVphaD/EURPahvuNZbWDbcB0JM/mpnv3MHEmj2X33gLp8aEWiwxvafLRypR2XZt6/3hCSYTeEriyxd4kA8UUJpWDPofQTSlrk+FnI6dm+3B3rmUzB61iyBeDoCv5MfOSxUBtVMKZB8uiZcDma5fvk/8bKT/0P8Ioh9VhXukB2T7rnn9dZ6Ofzg1hgG4u2Rddtiddfn4ywcBR7SFe2oMAwAAd1f9sEE4NYYvyziJxOMLfb+HIAX9jyA6gdLACIIgiMac1RaPc4MQ1cU9EoZarQf2G/Q/gugA04xpL61LJ0DLxW9aig8gXlh5NoTT7DLxbVnWfDxa++UcStMCoP8RREvohbLdjlrUbvwkYK/R95uDO9fdJXEcJ8zCf7q1ICW/uGst4lyqAjLX6ABaOavYuHDGoP8RRDOY0dhZSOumWJeD5pt6ibNeOyReTEqJh6wXXux8y4UiMtlzGsGOt3J3N0rTNoH+RxDNEK2NaSitS0N+bZ7dhjM8faOswEgM84iuFOdWLDg5y3z7g6p4W80OozQtDfofQfSE24zpKa0LAOWZ1M1YO/OddRKPN4sQwl8fFluPmaZFdeQrz2U9emogsHPzOaqdS38LixMnStMWoP8RRE/qzZjO0rrlRM/Du6XhLhXTi9ZO+AF3Zjg1DGM7eoJJ2902hWJsON2OkugO5A+hNC0H9D+CaEilGdNfWjfDmS8Hz+/xqc3Yg/AD7kwAGCzjtWN6D2l/Pn6Hl0ehHF84NQxjAk+jrWEY29FoaxiTiUQBEFCaVgj6H0E0o7LhXmdpXYZwQavjaUE2Hzp85yxJko+XrN6rT2pBNgaNPNOZLwfuyHHWSTyGZ94WHZSmFYL+RxA9oZsxzaV1mfX2YeDuNBtTOmtqC1uREcMAAMLFZizuXXyUybZmz2lFOYGn5Oki/qjcitK0YtD/CKIndDNW3UCsWTNQfiNKsZ7oEfF70U0vZpmeYFJ+hpjHRZnsQog28kwwHeeCudH0PEWnEN826nVoirNOdm5aaWvX4RGB/kcQ/UAxql5hXQ7SqShnHXlmOrrMPhaSDTUt5ntYIoqv7rxIN8qVM2lB/XRFOE0HFZIaPO/4aNfhEYH+RxD9QGlgBEEQRGNwNNZrQh+ni04J+h9B+g/bjGkvrQsA+QfltTWfxrmDSXNaiG+XyyPE90Pml3aekEvTqod3HqD/EaT/sM1Yuclj5waaHk0h/kS+nbnXVMSN0q3Z6Tq+tIdBK29thvQNbrmUwwQ+DZley+Nlfm6J3RtTU3SnwmP14M8C9D+CaIdoUlFXaV3iTzbjHe9sjx6YF9fNWudzS96/cMWbEJSlac2L6+fNL2F8kqi0BP2PINohaMZ0lNaFrBF78rQ69bwHpllkDAlDkm4TZ77Y2O7zjS2laeWygP8g6H8E6RPVZkxjad28EfuXXnnya/sBADC4tEzrY1LUicwfEpSkaZEa6H8EOT3/Vf521kmyhmxuY6mTQHDaiEXa2NsB4dSawc4jHy/XFyaYd+PBe/6T6T24U9mzL4+2MUzXEANjBgAQWAAwC2YAAyrjQ0j1canlRiPI/zdYxuvukqMd6H8E6QOitTHtpHXJr81zLqRlzZ4hGGqpbZBVie1w1kmydiB+h0sLMn126kdpx76lNG24HT7+uqurVGTHbi8WZ7bFAP2PINohasa0k9alhaji5QDcnZbaBnmVqIBSzasqTdu41efMvkaM/kcQ7WAmFYlvly+Yu0s0bAY0J9y+jOdqXg8Xs+tRWyWW8S6JVBdf5HX0ee0/QP8jiH4wzZjpRYl3Kks6xPQiPRW2wo9LxS0q4fTxMm5bM5oedkykoP8RRENQjKo3EN/eXijUc8S3DaOuvc6u0DQhkaYFACDx+xi25SHd/NOQOcNAqwVUKeh/BNETlAZGEARBNKY2GiOrm6Kvd7PC1eMWHFsTCLMEQRBECHdSMd/T++f+s6by6ZoatxBXMO//VCWLaMKpMQ0ByOoG2zgEQf49PnltjDtwCaeGtRlnx2Zi/zVXVg2nzOgjq667NeaY4X8O5O+rO3IA4jcY3+IOAgRB/jX+D+36U1+M4k/SAAAAAElFTkSuQmCC" alt="" />

这是在hdfs里面生成的文件,可以看到数据读取到hdfs上了。

如果不用mapreduce包,用mapred包也是可以的,代码就不上了,差不多,只是不要掉用setConf()方法把conf绑定上去也行。

这是第一种方法,第二种方法就是直接用org.postgresql.copy.CopyManager这个类

public ByteArrayOutputStream copyToStream(String tableOrQuery,String delimiter){
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
CopyManager copyManager = new CopyManager(
(BaseConnection) getConnection());
String copySql = "COPY " + tableOrQuery + " TO STDOUT";
if (delimiter != null){
copySql = copySql + " WITH DELIMITER AS '"+delimiter+"'";
}
copyManager.copyOut(copySql,
out);
return out;
}catch(Exception e){
e.printStackTrace();
}
return null;
} ByteArrayOutputStream out = copyToStream(sql.toString(), ",");
ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray()); public void uploadFile(String hdfsPath,InputStream in){
try {
FileSystem hdfs = FileSystem.get(conf);
FSDataOutputStream out = hdfs.create(new Path(hdfsPath));
org.apache.hadoop.io.IOUtils.copyBytes(in, out,4096,false);
out.sync();
out.close();
} catch (Exception e) {
// TODO: handle exception
}
}

把流读取出来,用hadoop自带的IOUtils.copyBytes()方法写到hdfs上就可以了就可以了。