UTF-8中的Ruby无效字节序列

时间:2023-01-06 14:33:47

I have the following code, which gives me an invalid byte sequence error pointing to the scan method in initialize. Any ideas on how to fix this? For what it's worth, the error does not occur when the (.*) between the h1 tag and the closing > is not there.

我有以下代码,它给我一个无效的字节序列错误指向初始化中的扫描方法。有想法该怎么解决这个吗?对于它的价值,当h1标签和关闭>之间的(。*)不存在时,不会发生错误。

#!/usr/bin/env ruby

class NewsParser

  def initialize
      Dir.glob("./**/index.htm") do |file|
        @file = IO.read file 
        parsed = @file.scan(/<h1(.*)>(.*?)<\/h1>(.*)<!-- InstanceEndEditable -->/im)
        self.write(parsed)
      end
  end

  def write output
    @contents = output
    open('output.txt', 'a') do |f| 
      f << @contents[0][0]+"\n\n"+@contents[0][1]+"\n\n\n\n" 
    end
  end

end

p = NewsParser.new

Edit: Here is the error message:

编辑:这是错误消息:

news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)

news_parser.rb:10:'scan':UTF-8中无效的字节序列(ArgumentError)

SOLVED: The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) and encoding: UTF-8 solve the issue.

求助:使用的组合:@file = IO.read(文件).force_encoding(“ISO-8859-1”)。encode(“utf-8”,替换:nil)和编码:UTF-8解决问题。

Thanks!

谢谢!

1 个解决方案

#1


34  

The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) and #encoding: UTF-8 solved the issue.

结合使用:@file = IO.read(file).force_encoding(“ISO-8859-1”)。encode(“utf-8”,replace:nil)和#encoding:UTF-8解决了这个问题。

#1


34  

The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) and #encoding: UTF-8 solved the issue.

结合使用:@file = IO.read(file).force_encoding(“ISO-8859-1”)。encode(“utf-8”,replace:nil)和#encoding:UTF-8解决了这个问题。