PHP:json_encode vs serialize用于存储在MySQL数据库中?

时间:2021-11-03 17:00:17

I'm storing some "unstructured" data (a keyed array) in one field of my table, and i'm currently using serialize() / unserialize() to "convert" back and forth from array to string.

我在我的表的一个字段中存储了一些“非结构化”数据(一个键控数组),我现在正在使用serialize()/ unserialize()来从数组到字符串来回“转换”。

Every now and then, however, I get errors when unserializing the data. I believe these errors happen because of Unicode data in the strings inside the array i'm serializing, although there are some records with Unicode data that work just fine. (DB field is UTF-8)

但是,在反序列化数据时,我偶尔会遇到错误。我相信这些错误的发生是因为我正在序列化的数组中的字符串中的Unicode数据,尽管有一些Unicode数据的记录可以正常工作。 (DB字段是UTF-8)

I'm wondering whether using json_encode instead of serialize will make a difference / make this more resilient. This is not trivial for me to test, since in my dev environment everything works well, but in production, every now and then (about 1% of records) I get an error.

我想知道使用json_encode而不是serialize是否会产生影响/使其更具弹性。这对我来说测试并不简单,因为在我的开发环境中一切运行良好,但在生产中,偶尔(大约1%的记录)我得到一个错误。

Btw, I know i'm weaseling out of finding an actual explanation for the problem and just blindly trying something, I'm kind of hoping I can get rid of this without spending too much time on it.

顺便说一句,我知道我正在寻找一个问题的实际解释而只是盲目地尝试一些东西,我有点希望我能在不花太多时间的情况下摆脱它。

Do you think using json_encode instead of serialize will make this more resilient to "serialization errors"? The data format does look more "forgiving" to me...

你认为使用json_encode而不是serialize会使这对“序列化错误”更具弹性吗?数据格式看起来对我来说更“宽容”......

UPDATE: The actual error i'm getting is:

更新:我得到的实际错误是:

 Notice: unserialize(): Error at offset 401 of 569 bytes in C:\blah.php on line 20

Thanks! Daniel

谢谢!丹尼尔

6 个解决方案

#1


11  

JSON has one main advantage :

JSON有一个主要优势:

  • compatibility with other languages than PHP.
  • 与PHP的其他语言兼容。

PHP's serialize has one main advantage :

PHP的序列化有一个主要优点:

  • it's specifically designed to store PHP-based data -- most notably, it can store serialized objects, instance of classes, that will be re-instanciated to the right class-type when the string is unserialized.
  • 它专门用于存储基于PHP的数据 - 最值得注意的是,它可以存储序列化对象,类的实例,当字符串被反序列化时,它们将被重新安装到正确的类类型。

(Yes, those advantages are the exact opposite of each other)

(是的,那些优势彼此完全相反)


In your case, as you are storing data that's not really structured, both formats should work pretty well.

在您的情况下,当您存储的数据不是真正的结构时,两种格式都应该可以很好地工作。

And the encoding problem you have should not be related to serialize by itself : as long as everything (DB, connection to the DB, PHP files, ...) is in UTF-8, serialization should work too.

而你所遇到的编码问题本身并不应该与序列化有关:只要所有内容(数据库,与数据库的连接,PHP文件......)都是UTF-8,序列化也应该有效。

#2


2  

The folks at FriendFeed opted for a similar solution using JSON. You should check out their blog post about it.

FriendFeed的人选择了使用JSON的类似解决方案。你应该查看他们的博客文章。

#3


1  

If the problem is (and I believe it is) in UTF-8 encoding, there is not difference between json_encode and serialize. Both will leave characters encoding unchanged.

如果问题是(并且我相信它)是UTF-8编码,则json_encode和serialize之间没有区别。两者都将保持字符编码不变。

You should make sure your database/connection is properly set up for handle all UTF-8 characters or encode whole record into supported encoding before inserting to the DB.

在插入数据库之前,应确保正确设置数据库/连接以处理所有UTF-8字符或将整个记录编码为支持的编码。

Also please specify what "I get an error" means.

另请指出“我得到错误”的含义。

#4


1  

Found this in the PHP docs...

在PHP文档中找到了这个...

function mb_unserialize($serial_str) { 
    $out = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $serial_str ); 
    return unserialize($out); 
} 

I don't quite understand it, but it worked to unserialize the data that I couldn't unserialize before. Moved to JSON now, i'll report in a couple of weeks whether this solved the problem of randomly getting some records "corrupted"

我不是很了解它,但它有效地反序列化我以前无法反序列化的数据。现在转移到JSON,我将在几周内报告这是否解决了随机获取某些记录“损坏”的问题

#5


1  

json_encode() converts non-ASCII symbols (e.g., “Schrödinger” becomes “Schr\u00f6dinger”) but serialize() does not.

json_encode()转换非ASCII符号(例如,“Schrödinger”变为“Schr \ u00f6dinger”),但serialize()不会。

Source: https://www.toptal.com/php/10-most-common-mistakes-php-programmers-make#common-mistake-6--ignoring-unicodeutf-8-issues

资料来源:https://www.toptal.com/php/10-most-common-mistakes-php-programmers-make#common-mistake-6--ignoring-unicodeutf-8-issues


To leave UTF-8 characters untouched, you can use the option JSON_UNESCAPED_UNICODE as of PHP 5.4.

要保持UTF-8字符不变,可以使用PHP 5.4中的选项JSON_UNESCAPED_UNICODE。

Source: https://*.com/a/804089/1438029

资料来源:https://*.com/a/804089/1438029

#6


0  

As a design decision, I'd opt for storing JSON because it can only represent a data structure, whereas serialization is bound to a PHP data object signature.

作为设计决策,我选择存储JSON,因为它只能表示数据结构,而序列化则绑定到PHP数据对象签名。

The advantages I see are: * you are forced to separate the data storage from any logic layer on top. * you are independent from changes to the data object class (say, for example, that you want to add a field).

我看到的优点是:*您*将数据存储与顶层的任何逻辑层分开。 *您独立于数据对象类的更改(例如,您要添加字段)。

#1


11  

JSON has one main advantage :

JSON有一个主要优势:

  • compatibility with other languages than PHP.
  • 与PHP的其他语言兼容。

PHP's serialize has one main advantage :

PHP的序列化有一个主要优点:

  • it's specifically designed to store PHP-based data -- most notably, it can store serialized objects, instance of classes, that will be re-instanciated to the right class-type when the string is unserialized.
  • 它专门用于存储基于PHP的数据 - 最值得注意的是,它可以存储序列化对象,类的实例,当字符串被反序列化时,它们将被重新安装到正确的类类型。

(Yes, those advantages are the exact opposite of each other)

(是的,那些优势彼此完全相反)


In your case, as you are storing data that's not really structured, both formats should work pretty well.

在您的情况下,当您存储的数据不是真正的结构时,两种格式都应该可以很好地工作。

And the encoding problem you have should not be related to serialize by itself : as long as everything (DB, connection to the DB, PHP files, ...) is in UTF-8, serialization should work too.

而你所遇到的编码问题本身并不应该与序列化有关:只要所有内容(数据库,与数据库的连接,PHP文件......)都是UTF-8,序列化也应该有效。

#2


2  

The folks at FriendFeed opted for a similar solution using JSON. You should check out their blog post about it.

FriendFeed的人选择了使用JSON的类似解决方案。你应该查看他们的博客文章。

#3


1  

If the problem is (and I believe it is) in UTF-8 encoding, there is not difference between json_encode and serialize. Both will leave characters encoding unchanged.

如果问题是(并且我相信它)是UTF-8编码,则json_encode和serialize之间没有区别。两者都将保持字符编码不变。

You should make sure your database/connection is properly set up for handle all UTF-8 characters or encode whole record into supported encoding before inserting to the DB.

在插入数据库之前,应确保正确设置数据库/连接以处理所有UTF-8字符或将整个记录编码为支持的编码。

Also please specify what "I get an error" means.

另请指出“我得到错误”的含义。

#4


1  

Found this in the PHP docs...

在PHP文档中找到了这个...

function mb_unserialize($serial_str) { 
    $out = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $serial_str ); 
    return unserialize($out); 
} 

I don't quite understand it, but it worked to unserialize the data that I couldn't unserialize before. Moved to JSON now, i'll report in a couple of weeks whether this solved the problem of randomly getting some records "corrupted"

我不是很了解它,但它有效地反序列化我以前无法反序列化的数据。现在转移到JSON,我将在几周内报告这是否解决了随机获取某些记录“损坏”的问题

#5


1  

json_encode() converts non-ASCII symbols (e.g., “Schrödinger” becomes “Schr\u00f6dinger”) but serialize() does not.

json_encode()转换非ASCII符号(例如,“Schrödinger”变为“Schr \ u00f6dinger”),但serialize()不会。

Source: https://www.toptal.com/php/10-most-common-mistakes-php-programmers-make#common-mistake-6--ignoring-unicodeutf-8-issues

资料来源:https://www.toptal.com/php/10-most-common-mistakes-php-programmers-make#common-mistake-6--ignoring-unicodeutf-8-issues


To leave UTF-8 characters untouched, you can use the option JSON_UNESCAPED_UNICODE as of PHP 5.4.

要保持UTF-8字符不变,可以使用PHP 5.4中的选项JSON_UNESCAPED_UNICODE。

Source: https://*.com/a/804089/1438029

资料来源:https://*.com/a/804089/1438029

#6


0  

As a design decision, I'd opt for storing JSON because it can only represent a data structure, whereas serialization is bound to a PHP data object signature.

作为设计决策,我选择存储JSON,因为它只能表示数据结构,而序列化则绑定到PHP数据对象签名。

The advantages I see are: * you are forced to separate the data storage from any logic layer on top. * you are independent from changes to the data object class (say, for example, that you want to add a field).

我看到的优点是:*您*将数据存储与顶层的任何逻辑层分开。 *您独立于数据对象类的更改(例如,您要添加字段)。