Java读取UTF-8格式文件第一行出现乱码——问号“?”及解决 And Java读带有BOM的UTF-8文件乱码原因及解决方法

时间:2021-12-12 11:23:12

测试例子:

Java读取UTF-8的txt文件第一行出现乱码“?”及解决


test.txt文件内容:
1
00:00:06,000 --> 00:00:06,010
<b>Allerleirauh</b> (2012)
<i>dTV - Das Erste - 20. Januar 2013</i>

2
00:00:10,280 --> 00:00:12,680
Was geh?rt zu einer guten Suppe?

3
00:00:14,200 --> 00:00:15,839
Eine gute Suppe...

test.txt文件采用写字板保存为UTF-8格式(此处为带有BOM的UTF-8文件)
保存并关闭后使用写字板再次打开该UTF-8文档,中文、字母正常显示

测试代码:

public static String srt2Txt(String filename){
File infile = new File(filename);
String realfile = filename.substring(0, filename.lastIndexOf(".srt")) + ".txt";
String tempfile = realfile.replace('/', '\\');//Windows写入文件路径格式
File outfile = new File(tempfile);
BufferedReader bufferedReader = null;
BufferedWriter bufferedWriter = null;
try {
bufferedReader = new BufferedReader(new FileReader(infile));
bufferedWriter = new BufferedWriter(new FileWriter(outfile));
String line;// 用来保存每次读取一行的内容
while ((line = bufferedReader.readLine()) != null) {
line = new String(line.getBytes("ISO-8859-1"), "ISO-8859-1");
bufferedWriter.write(line);
bufferedWriter.newLine();// 表示换行
bufferedWriter.flush();
}
} catch (IOException e) {
e.printStackTrace();
}finally{
if(null != bufferedReader){
try {
bufferedReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if(null != bufferedWriter){
try {
bufferedWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return realfile;
}
测试结果:

??
00:00:06,000 --> 00:00:06,010
<b>Allerleirauh</b> (2012)
<i>dTV - Das Erste - 20. Januar 2013</i>

2
00:00:10,280 --> 00:00:12,680
Was geh?rt zu einer guten Suppe?

3
00:00:14,200 --> 00:00:15,839
Eine gute Suppe...

解决方法:

使用UltraEdit将上边的txt文件另存为UTF-8无BOM格式;或者

使用Notepad++打开上边的txt文件执行如下操作“格式-->以UTF-8无BOM格式编码”,修改后将txt文本进行保存。