将UTF8表上的latin1字符转换为UTF8

时间:2023-01-06 14:47:08

Only today I realized that I was missing this in my PHP scripts:

直到今天我才意识到我在PHP脚本中漏掉了这个:

mysql_set_charset('utf8');

All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.

我所有的表都是InnoDB, collation“utf8_unicode_ci”,所有的VARCHAR列都是“utf8_unicode_ci”。我有mb_internal_encoding(“utf - 8”);在我的PHP脚本中,所有的PHP文件都被编码为UTF-8。

So, until now, every time I "INSERT" something with diacritics, example:

所以,直到现在,每次我“插入”带有变音符号的东西时,例如:

mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');

The 'name' contents would be, in this case: Jáuò Iñe.

“名称”内容,在这种情况下:JA¡uA²IA±e。

Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:

由于我修复了PHP和MySQL之间的字符集,新的插入现在正在正确地存储。但是,我希望修复当前“混乱”的所有旧行。我已经尝试了很多东西,但是它总是破坏了第一个“非法”字符的字符串。以下是我目前的代码:

$m = mysql_real_escape_string('¿<?php echo "¬<b>\'PHP &aacute; (á)ţăriîş </b>"; ?> ă-ţi abcdd;//;ñç´พดแทฝใจคçăâξβψδπλξξςαยนñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');

$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
    $message = $row['name'];
    $message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
    //$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
    mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}

It "UPDATE"s with the expected characters, except that the string gets truncated after the character "ă". I mean, that character and following chars are not included on the string.

它与预期的“更新”字符,除了字符串被截断后的字符“ă”。我的意思是,字符串中不包含字符和跟随字符。

Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT

同样,使用“iconv()”进行测试(这是对代码的注释)也可以这样做,即使是使用//忽略和/TRANSLIT

I also tested several charsets, between ISO-8859-1 and ISO-8859-15.

我还测试了一些字符集,在ISO-8859-1和ISO-8859-15之间。

I really need help here! Thank you.

我真的需要帮助!谢谢你!

3 个解决方案

#1


105  

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like

从您所描述的情况来看,您似乎有UTF-8数据,最初存储为Latin-1,然后没有正确转换为UTF-8。数据恢复;你需要一个MySQL函数

convert(cast(convert(name using  latin1) as binary) using utf8)

It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

您可能需要省略内部转换,这取决于在编码转换期间数据是如何被修改的。

#2


21  

After i searched about an hour or two for this answer. I needed to migrate a old tt_news db from typo into a new typo3 version. I already tried convert the charset in the export file and import it back, but didn't get it working.

我找了一两个小时才找到这个答案。我需要将一个旧的tt_news db从typo迁移到一个新的typo3版本。我已经尝试在导出文件中转换charset并将其导入,但是没有成功。

Then i tried the answer above from ABS and startet a update on the table:

然后我从ABS和startet那里尝试了上面的答案:

UPDATE tt_news SET 
    title=convert(cast(convert(title using  latin1) as binary) using utf8), 
    short=convert(cast(convert(short using  latin1) as binary) using utf8), 
    bodytext=convert(cast(convert(bodytext using  latin1) as binary) using utf8)
WHERE 1

You can also convert imagecaption, imagealttext, imagetitletext and keywords if needed. Hope this will help somebody migrating tt_news to new typo3 version.

如果需要,您还可以转换imagecaption、imagealttext、imagetitletext和关键字。希望这将有助于将tt_news迁移到新的typo3版本。

#3


0  

the way is better way use connection tow you database normal

方法是使用连接拖您的数据库正常的更好的方式

then use this code to make what you need you must make your page encoding utf-8 by meta in header cod html (dont forget this)

然后用这段代码来做你需要的东西你必须把你的页面编码utf-8以meta在header html中(别忘了这个)

then use this code

然后使用这个代码

    $result = mysql_query('SELECT * FROM shops');
    while ($row = mysql_fetch_assoc($ 
    $name= iconv("windows-1256", "UTF-8", $row['name']);

   mysql_query("SET NAMES 'utf8'"); 
   mysql_query("update   `shops` SET `name`='".$name."'  where ID='$row[ID]'  ");
    }

#1


105  

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like

从您所描述的情况来看,您似乎有UTF-8数据,最初存储为Latin-1,然后没有正确转换为UTF-8。数据恢复;你需要一个MySQL函数

convert(cast(convert(name using  latin1) as binary) using utf8)

It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

您可能需要省略内部转换,这取决于在编码转换期间数据是如何被修改的。

#2


21  

After i searched about an hour or two for this answer. I needed to migrate a old tt_news db from typo into a new typo3 version. I already tried convert the charset in the export file and import it back, but didn't get it working.

我找了一两个小时才找到这个答案。我需要将一个旧的tt_news db从typo迁移到一个新的typo3版本。我已经尝试在导出文件中转换charset并将其导入,但是没有成功。

Then i tried the answer above from ABS and startet a update on the table:

然后我从ABS和startet那里尝试了上面的答案:

UPDATE tt_news SET 
    title=convert(cast(convert(title using  latin1) as binary) using utf8), 
    short=convert(cast(convert(short using  latin1) as binary) using utf8), 
    bodytext=convert(cast(convert(bodytext using  latin1) as binary) using utf8)
WHERE 1

You can also convert imagecaption, imagealttext, imagetitletext and keywords if needed. Hope this will help somebody migrating tt_news to new typo3 version.

如果需要,您还可以转换imagecaption、imagealttext、imagetitletext和关键字。希望这将有助于将tt_news迁移到新的typo3版本。

#3


0  

the way is better way use connection tow you database normal

方法是使用连接拖您的数据库正常的更好的方式

then use this code to make what you need you must make your page encoding utf-8 by meta in header cod html (dont forget this)

然后用这段代码来做你需要的东西你必须把你的页面编码utf-8以meta在header html中(别忘了这个)

then use this code

然后使用这个代码

    $result = mysql_query('SELECT * FROM shops');
    while ($row = mysql_fetch_assoc($ 
    $name= iconv("windows-1256", "UTF-8", $row['name']);

   mysql_query("SET NAMES 'utf8'"); 
   mysql_query("update   `shops` SET `name`='".$name."'  where ID='$row[ID]'  ");
    }