Java API 读取HDFS的单文件

时间:2023-03-09 18:47:28
Java API 读取HDFS的单文件

HDFS上的单文件:

-bash-3.2$ hadoop fs -ls /user/pms/ouyangyewei/data/input/combineorder/repeat_rec_category
Found 1 items
-rw-r--r-- 2 deploy supergroup 520 2014-08-14 17:03 /user/pms/ouyangyewei/data/input/combineorder/repeat_rec_category/repeatRecCategory.txt

文件内容:

-bash-3.2$ hadoop fs -cat /user/pms/ouyangyewei/data/input/combineorder/repeat_rec_category/repeatRecCategory.txt | more
8104
960985
5472
971917
5320
971895
971902
971922
958261
972047
972050

Java API使用FileSystem方式 读取HDFS单文件的方法

/**
* 获取可反复推荐的类目。以英文逗号分隔
* @param filePath
* @param conf
* @return
*/
public String getRepeatRecCategoryStr(String filePath) {
final String DELIMITER = "\t";
final String INNER_DELIMITER = ","; String categoryFilterStrs = new String();
BufferedReader br = null;
try {
FileSystem fs = FileSystem.get(new Configuration());
FSDataInputStream inputStream = fs.open(new Path(filePath));
br = new BufferedReader(new InputStreamReader(inputStream)); String line = null;
while (null != (line = br.readLine())) {
String[] strs = line.split(DELIMITER);
categoryFilterStrs += (strs[0] + INNER_DELIMITER);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (null != br) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
} return categoryFilterStrs;
}