使用unicode字符对json进行解码和编码

时间:2022-10-17 10:36:56

I have some json I need to decode, alter and then encode without messing up any characters.

我有一些json,我需要解码,修改,然后编码,不弄乱任何字符。

If I have a unicode character in a json string it will not decode. I'm not sure why since json.org says a string can contain: any-Unicode-character- except-"-or-\-or- control-character. But it doesn't work in python either.

如果我有一个json字符串中的unicode字符,它将不会解码。我不知道为什么因为json.org上说字符串可以包含:any-Unicode-character- except- or-\-or- control-character。但它在python中也不起作用。

{"Tag":"Odómetro"}

I can use utf8_encode which will allow the string to be decoded with json_decode, however the character gets mangled into something else. This is the result from a print_r of the result array. Two characters.

我可以使用utf8_encode,它将允许字符串被json_decode解码,但是这个字符会被修改成别的东西。这是结果数组的print_r的结果。两个字符。

[Tag] => Odómetro

When I encode the array again I the character escaped to ascii, which is correct according to the json spec:

当我再次编码数组时,我将字符转义为ascii码,根据json规范是正确的:

"Tag"=>"Od\u00f3metro"

Is there some way I can un-escape this? json_encode gives no such option, utf8_encode does not seem to work either.

有什么办法可以让我摆脱这个?json_encode不提供此类选项,utf8_encode似乎也不工作。

Edit I see there is an unescaped_unicode option for json_encode. However it's not working as expected. Oh damn, it's only on php 5.4. I will have to use some regex as I only have 5.3.

编辑我看到json_encode有一个unescaped_unicode选项。然而,它并没有达到预期的效果。噢,该死,它只在php 4。4上。我将不得不使用一些regex,因为我只有5.3。

$json = json_encode($array, JSON_UNESCAPED_UNICODE);
Warning: json_encode() expects parameter 2 to be long, string ...

7 个解决方案

#1


12  

Judging from everything you've said, it seems like the original Odómetro string you're dealing with is encoded with ISO 8859-1, not UTF-8.

从你所说的一切来看,似乎你所处理的原始的Odometro字符串是用ISO 8859-1编码的,而不是UTF-8编码的。

Here's why I think so:

这就是为什么我这么认为:

  • json_encode produced parseable output after you ran the input string through utf8_encode, which converts from ISO 8859-1 to UTF-8.
  • json_encode在您通过utf8_encode运行输入字符串后生成可解析输出,该字符串将从ISO 8859-1转换为UTF-8。
  • You did say that you got "mangled" output when using print_r after doing utf8_encode, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó is \x63\xb3 in UTF-8, but that sequence is ó in ISO 8859-1.
  • 你说你有“支离破碎”输出当使用print_r做utf8_encode之后,但实际上破坏输出你是到底会发生什么,试图解析utf - 8的文本为ISO 8859 - 1(o \ x63 \ xb3在utf - 8中,但这序列³ISO 8859 - 1。
  • Your htmlentities hackaround solution worked. htmlentities needs to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.)
  • 你的htmlentities hackaround解决方案成功了。htmlentities需要知道输入字符串的什么编码才能正确工作。如果您不指定一个,它假定是ISO 8859-1。(html_entity_decode,令人困惑的是,默认值是UTF-8,因此您的方法具有从ISO 8859-1转换为UTF-8的效果。)
  • You said you had the same problem in Python, which would seem to exclude PHP from being the issue.
  • 您说您在Python中遇到了相同的问题,这似乎将PHP排除在问题之外。

PHP will use the \uXXXX escaping, but as you noted, this is valid JSON.

PHP将使用\uXXXX转义,但正如您所指出的,这是有效的JSON。

So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8' to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode, but this will only support characters that are part of ISO 8859-1).

因此,似乎您需要配置您的连接到Postgres,以便它将为您提供UTF-8字符串。PHP手册指出,您可以通过向连接字符串添加选项='- client_encoding=UTF8'来实现这一点。也有可能当前存储在数据库中的数据编码错误。(您可以简单地使用utf8_encode,但这只支持属于ISO 8859-1的字符)。

Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_r testing).

最后,正如另一个答案所指出的,您确实需要确保使用HTTP头或其他方式声明正确的字符集(当然,这个特定的问题可能只是您进行print_r测试的环境的产物)。

#2


16  

JSON_UNESCAPED_UNICODE was added in PHP 5.4 so it looks like you need upgrade your version of PHP to take advantage of it. 5.4 is not released yet though! :(

PHP 5.4中添加了JSON_UNESCAPED_UNICODE,因此看起来需要升级PHP版本才能利用它。5.4尚未发布!:(

There is a 5.4 alpha release candidate on QA though if you want to play on your development machine.

如果您想在开发机器上运行,那么QA上有一个5.4 alpha版本候选版本。

#3


16  

I have found following way to fix this issue... I hope this can help you.

我找到了解决这个问题的方法……我希望这能对你有所帮助。

json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);

#4


6  

A hacky way of doing JSON_UNESCAPED_UNICODE in PHP 5.3. Really disappointed by PHP json support. Maybe this will help someone else.

在PHP 5.3中执行JSON_UNESCAPED_UNICODE的一种简单方法。对PHP json的支持非常失望。也许这会帮助别人。

$array = some_json();
// Encode all string children in the array to html entities.
array_walk_recursive($array, function(&$item, $key) {
    if(is_string($item)) {
        $item = htmlentities($item);
    }
});
$json = json_encode($array);

// Decode the html entities and end up with unicode again.
$json = html_entity_decode($rson);

#5


4  

try setting the utf-8 encoding in your page:

尝试在您的页面中设置utf-8编码:

header('content-type:text/html;charset=utf-8');

this works for me:

这工作对我来说:

$arr = array('tag' => 'Odómetro');
$encoded = json_encode($arr);
$decoded = json_decode($encoded);
echo $decoded->{'tag'};

#6


3  

$json = array('tag' => 'Odómetro'); // Original array
$json = json_encode($json); // {"Tag":"Od\u00f3metro"}
$json = json_decode($json); // Od\u00f3metro becomes  Odómetro
echo $json->{'tag'}; // Odómetro
echo utf8_decode($json->{'tag'}); // Odómetro

You were close, just use utf8_decode.

你已经很接近了,使用utf8_decode。

#7


2  

Try Using:

尝试使用:

utf8_decode() and utf8_encode

#1


12  

Judging from everything you've said, it seems like the original Odómetro string you're dealing with is encoded with ISO 8859-1, not UTF-8.

从你所说的一切来看,似乎你所处理的原始的Odometro字符串是用ISO 8859-1编码的,而不是UTF-8编码的。

Here's why I think so:

这就是为什么我这么认为:

  • json_encode produced parseable output after you ran the input string through utf8_encode, which converts from ISO 8859-1 to UTF-8.
  • json_encode在您通过utf8_encode运行输入字符串后生成可解析输出,该字符串将从ISO 8859-1转换为UTF-8。
  • You did say that you got "mangled" output when using print_r after doing utf8_encode, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó is \x63\xb3 in UTF-8, but that sequence is ó in ISO 8859-1.
  • 你说你有“支离破碎”输出当使用print_r做utf8_encode之后,但实际上破坏输出你是到底会发生什么,试图解析utf - 8的文本为ISO 8859 - 1(o \ x63 \ xb3在utf - 8中,但这序列³ISO 8859 - 1。
  • Your htmlentities hackaround solution worked. htmlentities needs to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.)
  • 你的htmlentities hackaround解决方案成功了。htmlentities需要知道输入字符串的什么编码才能正确工作。如果您不指定一个,它假定是ISO 8859-1。(html_entity_decode,令人困惑的是,默认值是UTF-8,因此您的方法具有从ISO 8859-1转换为UTF-8的效果。)
  • You said you had the same problem in Python, which would seem to exclude PHP from being the issue.
  • 您说您在Python中遇到了相同的问题,这似乎将PHP排除在问题之外。

PHP will use the \uXXXX escaping, but as you noted, this is valid JSON.

PHP将使用\uXXXX转义,但正如您所指出的,这是有效的JSON。

So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8' to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode, but this will only support characters that are part of ISO 8859-1).

因此,似乎您需要配置您的连接到Postgres,以便它将为您提供UTF-8字符串。PHP手册指出,您可以通过向连接字符串添加选项='- client_encoding=UTF8'来实现这一点。也有可能当前存储在数据库中的数据编码错误。(您可以简单地使用utf8_encode,但这只支持属于ISO 8859-1的字符)。

Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_r testing).

最后,正如另一个答案所指出的,您确实需要确保使用HTTP头或其他方式声明正确的字符集(当然,这个特定的问题可能只是您进行print_r测试的环境的产物)。

#2


16  

JSON_UNESCAPED_UNICODE was added in PHP 5.4 so it looks like you need upgrade your version of PHP to take advantage of it. 5.4 is not released yet though! :(

PHP 5.4中添加了JSON_UNESCAPED_UNICODE,因此看起来需要升级PHP版本才能利用它。5.4尚未发布!:(

There is a 5.4 alpha release candidate on QA though if you want to play on your development machine.

如果您想在开发机器上运行,那么QA上有一个5.4 alpha版本候选版本。

#3


16  

I have found following way to fix this issue... I hope this can help you.

我找到了解决这个问题的方法……我希望这能对你有所帮助。

json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);

#4


6  

A hacky way of doing JSON_UNESCAPED_UNICODE in PHP 5.3. Really disappointed by PHP json support. Maybe this will help someone else.

在PHP 5.3中执行JSON_UNESCAPED_UNICODE的一种简单方法。对PHP json的支持非常失望。也许这会帮助别人。

$array = some_json();
// Encode all string children in the array to html entities.
array_walk_recursive($array, function(&$item, $key) {
    if(is_string($item)) {
        $item = htmlentities($item);
    }
});
$json = json_encode($array);

// Decode the html entities and end up with unicode again.
$json = html_entity_decode($rson);

#5


4  

try setting the utf-8 encoding in your page:

尝试在您的页面中设置utf-8编码:

header('content-type:text/html;charset=utf-8');

this works for me:

这工作对我来说:

$arr = array('tag' => 'Odómetro');
$encoded = json_encode($arr);
$decoded = json_decode($encoded);
echo $decoded->{'tag'};

#6


3  

$json = array('tag' => 'Odómetro'); // Original array
$json = json_encode($json); // {"Tag":"Od\u00f3metro"}
$json = json_decode($json); // Od\u00f3metro becomes  Odómetro
echo $json->{'tag'}; // Odómetro
echo utf8_decode($json->{'tag'}); // Odómetro

You were close, just use utf8_decode.

你已经很接近了,使用utf8_decode。

#7


2  

Try Using:

尝试使用:

utf8_decode() and utf8_encode