用opencsv和用Bufferedreader/writer进行csv文件的读写及简单处理

时间:2021-05-16 10:02:23

最近要大量处理一批.csv文件,由于收集到的数据大部分是9列,但是有的行的列数大于9,因此想写个程序去批处理这些.csv文件,使得处理好的文件可以是规则的,方便导入数据库。

方法一:

首先我想到了用opencsv去实现数据处理,但是我在其中遇到了一些问题,先说明一下,我处理的数据都是以省份中文拼音简写的.csv文件,比如anhui.csv 

先上代码

package anhui;

import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;

public class Anhui {

	public static void main(String[] args) throws IOException{
		// TODO Auto-generated method stub
		FileReader fileReader = new FileReader(new File("F:\\Porject\\data\\WeiboDataShare-master\\anhui.csv"));
		FileWriter fileWriter = new FileWriter(new File("F:\\Porject\\data\\Sina_dataAfterDeal\\anhui_new.csv"));
		CSVReader reader = new CSVReader(fileReader);
		CSVWriter writer = new CSVWriter(fileWriter);
		String[] strs = reader.readNext();
		int count = 0;
		int wrong_count = 0;
		int k = 0;
		while(strs != null){
			k++;
			if(strs.length==9){
				writer.writeNext(strs);
				count++;
			}
			else{
				wrong_count++;
				System.out.println("Wrong line :"+k);
				System.out.println("Wrong count:" + wrong_count);
			}
			System.out.println("执行到第"+k+"行");
			strs = reader.readNext();
			if(strs == null){
				break;
			}
		}
		reader.close();
		writer.close();
		System.out.println("The right lines is:" + count);
		System.out.println("The wrong lines is:" + (k-count));
		System.out.println("END!!");
	}

}


这个代码的主要功能就是去除哪些行的列数不为9的行,并将这些数据写到一个新的.csv文件中。

这个代码在实现的时候,对于少数的数据出现了一个问题:在执行到文件中的某行的时候程序停滞不前,程序也不结束,没有报异常,这个问题我现在都没搞明白,仔细的查看了程序停滞的代码行,发现和其他的行在结构上没有任何区别。但是在其他大多数的数据处理的时候是没问题的,希望解决这个问题的伙伴能够指导指导,不胜感激!


正是由于上面这个程序不能完全地处理完我的数据,我用第二个方法:


package anhui;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;


public class Readmethod2 {

	public static void main(String[] args) throws IOException {
		// TODO Auto-generated method stub
		String encoding = "utf-8";
		String[] provinces = {"anhui","aomen","beijing","chongqing","fujian","ganshu","guangdong","guangxi","guizhou"
				,"hainan","hebei","heilongjiang","henan","huan","hubei","jiangsu","jiangxi",
				"jilin","liaoning","neimenggu","ningxia","qinghai","shan1xi","shan3xi","shandong","shanghai",
				"sicuan","*","tianjin","xianggang","*","xizang","yunnan","zhejiang"};
		for(int i = 0; i < provinces.length; i++){
			
			String name_str = provinces[i]+".csv";
			String newName_str = provinces[i]+"_new.csv";
			String readFilePath = "F:\\Porject\\data\\WeiboDataShare-master\\"+name_str;
			String writeFilePath = "F:\\Porject\\data\\deal_data\\"+newName_str;
			
			BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File(readFilePath)) ,encoding));
			BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(writeFilePath)) ,encoding));
			
			String string = null;
			int count = 0;
			int sum = 0;
			//int k = 0;
			while((string = reader.readLine())!=null){
				sum++;
				String[] strs = string.split(",");
				if(strs.length == 9){
					writer.write(string);
					writer.newLine();
					writer.flush();
					count++;
				}
				else{
					//System.out.println("Wrong line: "+sum);
					//System.out.println("Wrong lines number is: " + k);
				}
			}
			//System.out.println(string);
			reader.close();
			writer.close();
//			System.out.println("The sum of lines are : "+sum);
//			System.out.println("The count of wrong lines is :"+count);
			System.out.println(name_str+"Finshed!!!!");
		}
		

	}

}



这个程序就能批处理我的数据!希望对大家有所帮助。