使用PHP在mySQL中存储HTML的编码问题

时间:2023-01-05 22:22:01

I have built a CMS that allows HTML to be stored in a database. It all started off very simple. I displayed the HTML in a textarea using htmlspecialchars to prevent it from breaking the form. Then saved it back using html_specialchars_decode. It all seemed to work fine until someone pasted some HTML into the system instead of typing. At this point it stored fine but lost most of the whitespace which meant all the lovely indentation had to be done from scratch.

我已经构建了一个允许将HTML存储在数据库中的CMS。这一切都非常简单。我使用htmlspecialchars在textarea中显示HTML以防止它破坏表单。然后使用html_specialchars_decode将其保存回来。这一切似乎工作正常,直到有人将HTML粘贴到系统而不是键入。此时它存储得很好但丢失了大部分空白,这意味着所有可爱的缩进都必须从头开始。

To fix it, I tried specifying everything in utf-8 encoding because any attempt to fiddle with it seemed to produce invalid characters.

为了解决这个问题,我尝试用utf-8编码指定所有内容,因为任何试图摆弄它的行为似乎都会产生无效字符。

I specify utf-8 in the PHP header

我在PHP标头中指定了utf-8

header('Content-Type: text/html; charset=utf-8');

I specify utf-8 in my HTML page

我在HTML页面中指定了utf-8

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

I specify utf-8 in the HTML form

我在HTML表单中指定了utf-8

<form accept-charset="utf-8" 

Then I read the posted value (basically) like this:

然后我读取发布的值(基本上)像这样:

$Val = $_POST[$SafeFieldName];

My understanding was that PHP did everything in utf-8 so I am a bit surprised at this stage that I get gobbledegook - unless I now do this:

我的理解是PHP在utf-8中做了所有事情所以我在这个阶段有点惊讶我得到了gobbledegook - 除非我现在这样做:

$Val = utf8_decode($Val);

So, at this stage - it works - sort of. I loose all my lovely indentation but not all of my white space. It's as if there are some non utf8 chars being stripped out. Weirdly I'm using Chrome but in Firefox, it seems fine

所以,在这个阶段 - 它的工作 - 有点。我松开了所有可爱的缩进,但不是我所有的空白。好像有一些非utf8字符被剥离了。奇怪的是我在使用Chrome但在Firefox中看起来很好

I think I'm just tying myself in knots now. Any elegant suggestions? I need to get to the bottom of this as opposed to just hack it to get it to work.

我想我现在只是打结自己。任何优雅的建议?我需要深入了解这一点,而不是只是破解它才能让它发挥作用。

4 个解决方案

#1


3  

The connection to the DB and the DB tables itself should support UTF-8. Make sure that your table's collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.

与DB和DB表本身的连接应支持UTF-8。确保表的排序规则为utf8_general_ci,并且表中的所有字符串字段也具有utf8_general_ci排序规则。

The DB connection should be UTF-8 as well:

数据库连接也应该是UTF-8:

mysql_set_charset('utf8');

See http://akrabat.com/php/utf8-php-and-mysql/ for more info.

有关详细信息,请参阅http://akrabat.com/php/utf8-php-and-mysql/。

Update: some report that

更新:有些人报告说

mysql_query('SET NAMES utf8');

is required sometimes as well!

有时也需要!

If making your tables and connection UTF-8 is not possible, you could of course save the HTML as BASE64 encoded data, and decode it back when you retrieve it from the DB again.

如果无法建立表和连接UTF-8,您当然可以将HTML保存为BASE64编码数据,并在再次从DB中检索时将其解码回来。

#2


0  

Check your DataBase connection encodin, and check DataBase table field encoding where you store HTML. Maybe there encoding is different from UTF-8

检查DataBase连接编码,并检查存储HTML的DataBase表字段编码。也许编码与UTF-8不同

#3


0  

If this is an issue in and out of MySQL (as you suggested in the title) then you need to make sure the columns and tables are UTF8-BIN and put mysql_set_charset('utf8'); after opening the connection to MySQL.

如果这是进出MySQL的问题(如标题中所示),那么你需要确保列和表是UTF8-BIN并放入mysql_set_charset('utf8');打开MySQL连接后。

#4


0  

Sorted - and the answer is really embarrassing - but you never know, some day someone may need this :)

排序 - 答案真的令人尴尬 - 但你永远不知道,有一天有人可能需要这个:)

I noticed that it worked differently (but still fairly rubbish) in Firefox so I had a look at my style sheet and found this:

我注意到它在Firefox中的工作方式不同(但仍然相当垃圾)所以我查看了我的样式表,发现了这个:

white-space: nowrap;

Someone (me) must have put that in there to try to get horizontal scrolling working in some browser. Without that, the HTML makes it all the way to the DB and back again.

有人(我)必须把它放在那里试图让水平滚动在某些浏览器中工作。没有它,HTML就会一直到数据库并再次返回。

My only other question was why did I need this since the whole thing should have been arriving in utf8

我唯一的另一个问题是为什么我需要这个,因为整个事情本应该到达utf8

$Val = utf8_decode($Val);

$ Val = utf8_decode($ Val);

Magically - now I don't need it.

神奇 - 现在我不需要它。

#1


3  

The connection to the DB and the DB tables itself should support UTF-8. Make sure that your table's collation is utf8_general_ci and that all string fields within the table also have the utf8_general_ci collation.

与DB和DB表本身的连接应支持UTF-8。确保表的排序规则为utf8_general_ci,并且表中的所有字符串字段也具有utf8_general_ci排序规则。

The DB connection should be UTF-8 as well:

数据库连接也应该是UTF-8:

mysql_set_charset('utf8');

See http://akrabat.com/php/utf8-php-and-mysql/ for more info.

有关详细信息,请参阅http://akrabat.com/php/utf8-php-and-mysql/。

Update: some report that

更新:有些人报告说

mysql_query('SET NAMES utf8');

is required sometimes as well!

有时也需要!

If making your tables and connection UTF-8 is not possible, you could of course save the HTML as BASE64 encoded data, and decode it back when you retrieve it from the DB again.

如果无法建立表和连接UTF-8,您当然可以将HTML保存为BASE64编码数据,并在再次从DB中检索时将其解码回来。

#2


0  

Check your DataBase connection encodin, and check DataBase table field encoding where you store HTML. Maybe there encoding is different from UTF-8

检查DataBase连接编码,并检查存储HTML的DataBase表字段编码。也许编码与UTF-8不同

#3


0  

If this is an issue in and out of MySQL (as you suggested in the title) then you need to make sure the columns and tables are UTF8-BIN and put mysql_set_charset('utf8'); after opening the connection to MySQL.

如果这是进出MySQL的问题(如标题中所示),那么你需要确保列和表是UTF8-BIN并放入mysql_set_charset('utf8');打开MySQL连接后。

#4


0  

Sorted - and the answer is really embarrassing - but you never know, some day someone may need this :)

排序 - 答案真的令人尴尬 - 但你永远不知道,有一天有人可能需要这个:)

I noticed that it worked differently (but still fairly rubbish) in Firefox so I had a look at my style sheet and found this:

我注意到它在Firefox中的工作方式不同(但仍然相当垃圾)所以我查看了我的样式表,发现了这个:

white-space: nowrap;

Someone (me) must have put that in there to try to get horizontal scrolling working in some browser. Without that, the HTML makes it all the way to the DB and back again.

有人(我)必须把它放在那里试图让水平滚动在某些浏览器中工作。没有它,HTML就会一直到数据库并再次返回。

My only other question was why did I need this since the whole thing should have been arriving in utf8

我唯一的另一个问题是为什么我需要这个,因为整个事情本应该到达utf8

$Val = utf8_decode($Val);

$ Val = utf8_decode($ Val);

Magically - now I don't need it.

神奇 - 现在我不需要它。