Ruby 1.9。用字符串中经过清理的特定字符替换字符集

I'm looking for a way to do the following PHP code in Ruby in a succinct and efficient manner:

我正在寻找一种方法，用Ruby简洁高效地完成以下PHP代码:

$normalizeChars = array('Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
        'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
        'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
        'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
        'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
        'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
        'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f');
$cleanGenre = strtr($this->entryArray['genre'], $normalizeChars);

Here the strtr() function will replace the character on the left with the one on the right in the array. Pretty handy for a cleanup job. But I can't seem to find anywhint similar in Ruby, that is, a way to specify which characters to replace all in one array rather than with lengthy conditionals for each character.

在这里，strtr()函数将用数组中右边的字符替换左边的字符。清理工作非常方便。但在Ruby中，我似乎找不到任何相似的地方，也就是说，这是一种指定要在一个数组中替换所有字符的方法，而不是为每个字符使用冗长的条件语句。

Note that tr won't work cause you can't replace one letter with two (D => Dj). Plus it gives me an InvalidByteSequenceError: "\xC5" on US-ASCII for this line:

注意tr不能工作，因为你不能用两个字母替换一个字母(D => Dj)。另外，它给了我一个InvalidByteSequenceError:“\xC5”，用于这一行:

    entry["genre"].tr('ŠšŽž', 'SsZz')

Thanks.

谢谢。

3 个解决方案

#1

I'll make it easy for you to implement

我将使您易于实现

#encoding: UTF-8
t = 'ŠšÐŽžÀÁÂÃÄAÆAÇÈÉÊËÌÎÑNÒOÓOÔOÕOÖOØOUÚUUÜUÝYÞBßSàaáaâäaaæaçcèéêëìîðñòóôõöùûýýþÿƒ'
fallback = { 
  'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
  'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
  'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
  'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
  'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
  'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
  'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'
  }

p t.encode('us-ascii', :fallback => fallback)

#2

In Ruby 1.9.3 you can use the :fallback option with encode:

在Ruby 1.9.3中，可以使用:

"ŠšŽžÐ".encode('us-ascii', :fallback => { [your character table here] })
=> "SsZzDj"

It's also possible to do it with gsub as it accepts a conversion table as a hash argument in 1.9.x:

gsub也可以这样做，因为它接受转换表作为1.9.x中的散列参数:

"ŠšŽžÐ".gsub(/[ŠšŽžÐ]/, [your character table here])
=> "SsZzDj"

Or better yet (by @steenslag):

或者更好(@steenslag):

character_table = [your table here]
regexp_keys     = Regexp.union(character_table.keys) 
"ŠšŽžÐ".gsub(regexp_keys, character_table)
=> "SsZzDj"

This sort of character conversion is called transliteration, which is good to know if you wish to google for more solutions (there are many Ruby libraries that support transliteration, but none of the ones I tested supported your character set completely).

这种类型的字符转换称为音译，如果您希望谷歌提供更多的解决方案(有许多Ruby库支持音译，但我测试过的库中没有一个完全支持您的字符集)，这一点很好。

#3

This works as I suppose you'd like it to have: translating characters in the array and leaving those not in there as they are:

我想你希望它的工作原理是这样的:在数组中翻译字符，而不把它们放在那里:

# encoding: utf-8
lookup = {'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',
        'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',
        'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',
        'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',
        'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
        'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
        'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'}

clean_genre = entry["genre"].chars.to_a.map { |x|
  if lookup.has_key?(x)
    lookup[x]
  else
    x
  end
}.join

for example this:

例如:

'aŠšŽž'.chars.to_a.map { |x|
  if lookup.has_key?(x)
    lookup[x]
  else
    x
  end
}.join

gives you 'aSsZz'.

给你“aSsZz”。

Or move the block logic into the lookup table itself (thanks to steenslag for simplifying the default proc solution!):

或者将块逻辑移动到查找表本身(感谢steenslag简化了默认的proc解决方案!)

lookup.default_proc = proc { |hash, key| key }

then the call would look as follows:

然后，电话的内容如下:

puts 'aŠšŽž'.chars.to_a.map { |x| lookup[x] }.join

Or even better (thanks again to steenslag for pointing out):

或者更好(再次感谢steenslag的指出):

puts 'aŠšŽž'.gsub(/./) { |x| lookup[x] }

#1