如何将UTF-8转换为Java的unicode ?

时间:2022-11-29 20:15:30

For example, in Emoji Char set, U+1F601 is the unicode value for "GRINNING FACE WITH SMILING EYES", and \xF0\x9F\x98\x81 is the UTF-8 bytes value for this character.

例如,在Emoji Char集合中,U+1F601是“带着微笑的眼睛的笑脸”的unicode值,并且\xF0\x9F\x98\x81是这个字符的UTF-8字节值。

\xE2\x9D\xA4 is for heavy black heart, and the unicode is U+2764.

\xE2\x9D\xA4是重型黑色心脏,而unicode是U+2764。

So my question is, if I have a byte array with value (0xF0, 0x9F, 0x98, 0x81, 0xE2, 0x9D, 0xA4), then how I can convert it into Unicode value?

因此,我的问题是,如果我有一个具有值的字节数组(0xF0, 0x9F, 0x98, 0x81, 0xE2, 0x9D, 0xA4),那么如何将它转换为Unicode值呢?

For the above result, what I want is a String array with value "1F601" and "2764".

对于上述结果,我想要的是一个值为“1F601”和“2764”的字符串数组。

I know I can write a complex method to do this work, but I hope there is already a library to do this work.

我知道我可以编写一个复杂的方法来完成这项工作,但是我希望已经有一个库来完成这项工作。

3 个解决方案

#1


8  

So my question is, if I have a byte array with value (0xF0, 0x9F, 0x98, 0x81), then how I can convert it into Unicode value?

因此,我的问题是,如果我有一个具有值的字节数组(0xF0, 0x9F, 0x98, 0x81),那么如何将它转换为Unicode值呢?

Simply call the String constructor specifying the data and the encoding:

简单地调用指定数据和编码的字符串构造函数:

String text = new String(bytes, "UTF-8");

You can specify a Charset instead of the name of the encoding - I like Guava's simple Charsets class, which allows you to write:

您可以指定一个Charset而不是编码的名称——我喜欢Guava的简单的Charsets类,它允许您编写:

String text = new String(bytes, Charsets.UTF_8);

Or for Java 7, use StandardCharsets without even needing Guava:

或者对于Java 7,使用标准字符集,甚至不需要Guava:

String text = new String(bytes, StandardCharsets.UTF_8);

#2


1  

Simply use String class:

简单地使用String类:

byte[] bytesArray = new byte[10]; // array of bytes (0xF0, 0x9F, 0x98, 0x81)

String string = new String(bytesArray, Charset.forName("UTF-8")); // covert byteArray

System.out.println(string); // Test result

#3


0  

Here is an example using InputStreamReader:

下面是一个使用InputStreamReader的例子:

InputStream inputStream = new FileInputStream("utf-8-text.txt");
Reader      reader      = new InputStreamReader(inputStream,
                                                Charset.forName("UTF-8"));

int data = reader.read();
while(data != -1){
    char theChar = (char) data;
    data = reader.read();
}

reader.close();

Ref:Java I18N example

裁判:Java I18N的例子

#1


8  

So my question is, if I have a byte array with value (0xF0, 0x9F, 0x98, 0x81), then how I can convert it into Unicode value?

因此,我的问题是,如果我有一个具有值的字节数组(0xF0, 0x9F, 0x98, 0x81),那么如何将它转换为Unicode值呢?

Simply call the String constructor specifying the data and the encoding:

简单地调用指定数据和编码的字符串构造函数:

String text = new String(bytes, "UTF-8");

You can specify a Charset instead of the name of the encoding - I like Guava's simple Charsets class, which allows you to write:

您可以指定一个Charset而不是编码的名称——我喜欢Guava的简单的Charsets类,它允许您编写:

String text = new String(bytes, Charsets.UTF_8);

Or for Java 7, use StandardCharsets without even needing Guava:

或者对于Java 7,使用标准字符集,甚至不需要Guava:

String text = new String(bytes, StandardCharsets.UTF_8);

#2


1  

Simply use String class:

简单地使用String类:

byte[] bytesArray = new byte[10]; // array of bytes (0xF0, 0x9F, 0x98, 0x81)

String string = new String(bytesArray, Charset.forName("UTF-8")); // covert byteArray

System.out.println(string); // Test result

#3


0  

Here is an example using InputStreamReader:

下面是一个使用InputStreamReader的例子:

InputStream inputStream = new FileInputStream("utf-8-text.txt");
Reader      reader      = new InputStreamReader(inputStream,
                                                Charset.forName("UTF-8"));

int data = reader.read();
while(data != -1){
    char theChar = (char) data;
    data = reader.read();
}

reader.close();

Ref:Java I18N example

裁判:Java I18N的例子