PHP中的编码问题(UTF-8)

时间:2023-01-06 21:46:46

I want to output the following string in PHP:

我想用PHP输出如下字符串:

ä ö ü ß €

o u߀

Therefore, I've encoded it to utf8 manually:

因此,我将它手工编码为utf8:

ä ö ü ß €

一个¤¶¼aÿ€

So my script is:

所以我的脚本:

<?php
header('content-type: text/html; charset=utf-8');
echo 'ä ö ü ß €';
?>

The first 4 characters are correct (ä ö ü ß) but unfortunately the € sign isn't correct:

前4个字符是正确的(o uß),但不幸的是,€标志不正确的:

ä ö ü ß

o uß

Here you can see it.

在这里你可以看到它。

Can you tell me what I've done wrong? My editor (Notepad++) has settings for Encoding (Ansi/UTF-8) and Format (Windows/Unix). Do I have to change them?

你能告诉我我做错了什么吗?我的编辑器(Notepad++)具有编码(Ansi/UTF-8)和格式(Windows/Unix)的设置。我需要改变它们吗?

I hope you can help me. Thanks in advance!

我希望你能帮助我。提前谢谢!

6 个解决方案

#1


8  

That last character just isn't in the file (try viewing the source), which is why you don't see it.

最后一个字符不在文件中(请尝试查看源代码),这就是为什么您看不到它的原因。

I think you might be better off saving the PHP file as UTF-8 (in Notepad++ that options is available in Format -> Encode in UTF-8 without BOM), and inserting the actual characters in your PHP file (i.e. in Notepad++), rather than hacking around with inserting à everywhere. You may find Windows Character Map useful for inserting unicode characters.

我认为您最好将PHP文件保存为UTF-8(在Notepad++中,可以使用格式->编码的UTF-8编码,而不需要BOM),并将实际的字符插入到PHP文件中(即在Notepad++中),而不是在任何地方插入A。您可能会发现Windows字符映射对于插入unicode字符非常有用。

#2


5  

The Euro sign (U+20AC) is encoded in UTF-8 with three bytes, not two. This can be seen here. So your encoding is simply wrong.

欧元符号(U+20AC)是用UTF-8编码的,有三个字节,而不是两个字节。这里可以看到。所以你的编码是完全错误的。

#3


4  

If you want to output it properly to utf8, your script should be:

如果您想将它正确地输出到utf8,您的脚本应该是:

<?php
header('content-type: text/html; charset=utf-8');
echo "\xc3\xa4"."\xc3\xb6"."\xc3\xbc"."\xc3\x9f"."\xe2\x82\xac";
?>

That way even if your php script is saved to a non-utf-8 encoding, it will still work.

这样,即使将php脚本保存为非utf-8编码,它仍然可以工作。

#4


2  

You should always set your editor to the same encoding that the generated HTML instructs the browser to use. If the HTML page is intended to be interpreted as UTF-8, then set your text editor to UTF-8. PHP is completely unaware of the encoding settings of the editor used to create the file; it treats strings as a stream of bytes.

您应该始终将编辑器设置为生成的HTML指示浏览器使用的相同编码。如果HTML页面打算被解释为UTF-8,那么将文本编辑器设置为UTF-8。PHP完全不知道用于创建文件的编辑器的编码设置;它将字符串视为字节流。

In other words, as long as the right bytes are in the file, everything will work. And the easiest way to ensure the right bytes are in the file, is to set your encoding to the same one the web page is supposed to be in. Anything else just makes life more difficult than it needs to be.

换句话说,只要文件中有正确的字节,一切都将正常工作。要确保文件中有正确的字节,最简单的方法是将编码设置为web页面应该包含的编码。任何事情都会让生活变得比它需要的更困难。

But the best defence is to leave non-ASCII characters out of the code completely. You can pull them out of a database or localisation file instead. This means the code can be modified in essentially any editor without worrying about damaging the encoding.

但是最好的防御是将非ascii字符完全从代码中删除。您可以将它们从数据库或本地化文件中取出。这意味着可以在任何编辑器中修改代码,而不必担心破坏编码。

#5


0  

header('Content-Type: text/html; charset=UTF-8');

This just informs the browsers what kind of content you're going to send it and how it should treat it. It does not set the encoding of the actual content you're sending. It's completely up to you to fulfil your own promise. Your content is not going to magically transform from whatever to UTF-8 just because you set that header. If you tell the browser to treat the content as UTF-8, but you're sending it Latin-1 encoded data, of course it will break.

这只会告诉浏览器你要发送什么样的内容以及如何处理它。它不设置要发送的实际内容的编码。你完全有责任履行自己的诺言。你的内容不会因为你设置了标题就神奇地从任何东西变成UTF-8。如果您告诉浏览器将内容视为UTF-8,但是您发送的是Latin-1编码的数据,那么它当然会崩溃。

I refer you to What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

我指的是每个程序员绝对需要知道的编码和字符集来处理文本

#6


0  

this worked for me

这为我工作

    if (mb_check_encoding($value, 'UTF-8')) {
      return $value = utf8_encode($value);  
    }  
    else  {
      return $value;
    }

Source : https://github.com/jdorn/php-reports/issues/100

来源:https://github.com/jdorn/php-reports/issues/100

#1


8  

That last character just isn't in the file (try viewing the source), which is why you don't see it.

最后一个字符不在文件中(请尝试查看源代码),这就是为什么您看不到它的原因。

I think you might be better off saving the PHP file as UTF-8 (in Notepad++ that options is available in Format -> Encode in UTF-8 without BOM), and inserting the actual characters in your PHP file (i.e. in Notepad++), rather than hacking around with inserting à everywhere. You may find Windows Character Map useful for inserting unicode characters.

我认为您最好将PHP文件保存为UTF-8(在Notepad++中,可以使用格式->编码的UTF-8编码,而不需要BOM),并将实际的字符插入到PHP文件中(即在Notepad++中),而不是在任何地方插入A。您可能会发现Windows字符映射对于插入unicode字符非常有用。

#2


5  

The Euro sign (U+20AC) is encoded in UTF-8 with three bytes, not two. This can be seen here. So your encoding is simply wrong.

欧元符号(U+20AC)是用UTF-8编码的,有三个字节,而不是两个字节。这里可以看到。所以你的编码是完全错误的。

#3


4  

If you want to output it properly to utf8, your script should be:

如果您想将它正确地输出到utf8,您的脚本应该是:

<?php
header('content-type: text/html; charset=utf-8');
echo "\xc3\xa4"."\xc3\xb6"."\xc3\xbc"."\xc3\x9f"."\xe2\x82\xac";
?>

That way even if your php script is saved to a non-utf-8 encoding, it will still work.

这样,即使将php脚本保存为非utf-8编码,它仍然可以工作。

#4


2  

You should always set your editor to the same encoding that the generated HTML instructs the browser to use. If the HTML page is intended to be interpreted as UTF-8, then set your text editor to UTF-8. PHP is completely unaware of the encoding settings of the editor used to create the file; it treats strings as a stream of bytes.

您应该始终将编辑器设置为生成的HTML指示浏览器使用的相同编码。如果HTML页面打算被解释为UTF-8,那么将文本编辑器设置为UTF-8。PHP完全不知道用于创建文件的编辑器的编码设置;它将字符串视为字节流。

In other words, as long as the right bytes are in the file, everything will work. And the easiest way to ensure the right bytes are in the file, is to set your encoding to the same one the web page is supposed to be in. Anything else just makes life more difficult than it needs to be.

换句话说,只要文件中有正确的字节,一切都将正常工作。要确保文件中有正确的字节,最简单的方法是将编码设置为web页面应该包含的编码。任何事情都会让生活变得比它需要的更困难。

But the best defence is to leave non-ASCII characters out of the code completely. You can pull them out of a database or localisation file instead. This means the code can be modified in essentially any editor without worrying about damaging the encoding.

但是最好的防御是将非ascii字符完全从代码中删除。您可以将它们从数据库或本地化文件中取出。这意味着可以在任何编辑器中修改代码,而不必担心破坏编码。

#5


0  

header('Content-Type: text/html; charset=UTF-8');

This just informs the browsers what kind of content you're going to send it and how it should treat it. It does not set the encoding of the actual content you're sending. It's completely up to you to fulfil your own promise. Your content is not going to magically transform from whatever to UTF-8 just because you set that header. If you tell the browser to treat the content as UTF-8, but you're sending it Latin-1 encoded data, of course it will break.

这只会告诉浏览器你要发送什么样的内容以及如何处理它。它不设置要发送的实际内容的编码。你完全有责任履行自己的诺言。你的内容不会因为你设置了标题就神奇地从任何东西变成UTF-8。如果您告诉浏览器将内容视为UTF-8,但是您发送的是Latin-1编码的数据,那么它当然会崩溃。

I refer you to What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

我指的是每个程序员绝对需要知道的编码和字符集来处理文本

#6


0  

this worked for me

这为我工作

    if (mb_check_encoding($value, 'UTF-8')) {
      return $value = utf8_encode($value);  
    }  
    else  {
      return $value;
    }

Source : https://github.com/jdorn/php-reports/issues/100

来源:https://github.com/jdorn/php-reports/issues/100