post 中文数据到elasticsearch restful接口报json_parse_exception 问题

时间:2022-08-18 16:20:38

我们的客户端程序直接调用es 的restful接口, 通过post json数据去查询, 但post数据有中文的时候,有些中文会报异常,有些中文不会

{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Invalid UTF-8 middle byte 0x5c\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@58cf272c; line: 1, column: 238]"}],"type":"json_parse_exception","reason":"Invalid UTF-8 middle byte 0x5c\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@58cf272c; line: 1, column: 238]"},"status":500}

而通过es head插件去post 同样的json数据,却运行正常,  初步判断写数据的时候有问题, 上代码

   

URL url = new URL(esURL);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setDoOutput(true);
connection.setDoInput(true);
connection.setRequestMethod("POST");
connection.setUseCaches(false);
//connection.setConnectTimeout(30000);// 超时时间设置为30秒
connection.setInstanceFollowRedirects(true);
connection.setRequestProperty("Charsert", "UTF-8");
connection.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
connection.setRequestProperty("Accept-Language", "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3"); connection.connect(); // POST请求
DataOutputStream out = new DataOutputStream(connection.getOutputStream());
out.writeBytes(query);

问题就出在wirteBytes()方法里,我们看JDK源代码

public final void writeBytes(String s) throws IOException {
int len = s.length();
for (int i = 0 ; i < len ; i++) {
out.write((byte)s.charAt(i));
}
incCount(len);
}

我们知道UTF8编码里一个中文用3个字节来存储,而这里是直接把一个中文强制转一个byte, 这样肯定会有问题的

修改代码成

out.write(query.getBytes("UTF-8"));

问题解决