从任何编码到UTF-8的强制字符串。

时间:2023-01-06 11:53:34

In my rails app I'm working with RSS feeds from all around the world, and some feeds have links that are not in UTF-8. The original feed links are out of my control, and in order to use them in other parts of the app, they need to be in UTF-8.

在我的rails应用程序中,我正在使用来自世界各地的RSS提要,有些提要的链接并不是UTF-8。原始的feed链接不在我的控制范围内,为了在app的其他部分使用它们,他们需要使用UTF-8。

How can I detect encoding and convert to UTF-8?

如何检测编码并转换为UTF-8?

3 个解决方案

#1


48  

Ruby 1.9

Ruby 1.9

"Forcing" an encoding is easy, however it won't convert the characters just change the encoding:

“强制”一种编码很容易,但是它不会转换字符只是改变编码:

str = str.force_encoding("UTF-8")

str.encoding.name # => 'UTF-8'

If you want to perform a conversion, use encode:

如果您想执行转换,请使用编码:

begin
  str.encode("UTF-8")
rescue Encoding::UndefinedConversionError
  # ...
end

I would definitely read the following post for more information:
http://graysoftinc.com/character-encodings/ruby-19s-string

我肯定会阅读下面的文章,以获得更多信息:http://graysoftinc.com/character-encodings/ruby-19s-string。

#2


22  

This will ensure you have the correct encoding and won't error out because it replaces any invalid or undefined character with a blank string.

这将确保您有正确的编码,并且不会出错,因为它用一个空字符串替换任何无效或未定义的字符。

This will ensure no matter what, that you have a valid UTF-8 string

这将确保无论如何,您拥有一个有效的UTF-8字符串。

str.encode(Encoding.find('UTF-8'), {invalid: :replace, undef: :replace, replace: ''})

#3


4  

Iconv

Iconv

require 'iconv'
i = Iconv.new('UTF-8','LATIN1')
a_with_hat = i.iconv("\xc2")

Summary: the iconv gem does all the work of converting encodings. Make sure it's installed with:

概要:iconv gem完成了所有转换编码的工作。确保安装了:

gem install iconv

Now, you need to know what encoding your string is currently in as Ruby 1.8 treats Strings as an array of bytes (with no intrinsic encoding.) For example, say your string was in latin1 and you wanted to convert it to utf-8

现在,您需要知道当前字符串的编码是什么,因为Ruby 1.8将字符串当作一个字节数组(没有内部编码)。例如,假设您的字符串在latin1中,您希望将其转换为utf-8。

require 'iconv'

string_in_utf8_encoding = Iconv.conv("UTF8", "LATIN1", string_in_latin1_encoding)

#1


48  

Ruby 1.9

Ruby 1.9

"Forcing" an encoding is easy, however it won't convert the characters just change the encoding:

“强制”一种编码很容易,但是它不会转换字符只是改变编码:

str = str.force_encoding("UTF-8")

str.encoding.name # => 'UTF-8'

If you want to perform a conversion, use encode:

如果您想执行转换,请使用编码:

begin
  str.encode("UTF-8")
rescue Encoding::UndefinedConversionError
  # ...
end

I would definitely read the following post for more information:
http://graysoftinc.com/character-encodings/ruby-19s-string

我肯定会阅读下面的文章,以获得更多信息:http://graysoftinc.com/character-encodings/ruby-19s-string。

#2


22  

This will ensure you have the correct encoding and won't error out because it replaces any invalid or undefined character with a blank string.

这将确保您有正确的编码,并且不会出错,因为它用一个空字符串替换任何无效或未定义的字符。

This will ensure no matter what, that you have a valid UTF-8 string

这将确保无论如何,您拥有一个有效的UTF-8字符串。

str.encode(Encoding.find('UTF-8'), {invalid: :replace, undef: :replace, replace: ''})

#3


4  

Iconv

Iconv

require 'iconv'
i = Iconv.new('UTF-8','LATIN1')
a_with_hat = i.iconv("\xc2")

Summary: the iconv gem does all the work of converting encodings. Make sure it's installed with:

概要:iconv gem完成了所有转换编码的工作。确保安装了:

gem install iconv

Now, you need to know what encoding your string is currently in as Ruby 1.8 treats Strings as an array of bytes (with no intrinsic encoding.) For example, say your string was in latin1 and you wanted to convert it to utf-8

现在,您需要知道当前字符串的编码是什么,因为Ruby 1.8将字符串当作一个字节数组(没有内部编码)。例如,假设您的字符串在latin1中,您希望将其转换为utf-8。

require 'iconv'

string_in_utf8_encoding = Iconv.conv("UTF8", "LATIN1", string_in_latin1_encoding)