在Ruby/Rails中,如何在url中编码/转义特殊字符?

时间:2021-03-28 00:26:31

How do I encode or 'escape' the URL before I use OpenURI to open(url)?

如何在使用OpenURI打开URL之前对URL进行编码或“转义”?

We're using OpenURI to open a remote url and return the xml:

我们使用OpenURI打开一个远程url并返回xml:

getresult = open(url).read

The problem is the URL contains some user-input text that contains spaces and other characters, including "+", "&", "?", etc. potentially, so we need to safely escape the URL. I saw lots of examples when using Net::HTTP, but have not found any for OpenURI.

问题是URL包含一些用户输入文本,其中包含空格和其他字符,包括“+”、“&”、“?”等,因此我们需要安全地转义URL。我在使用Net::HTTP时看到了很多示例,但是还没有找到OpenURI的示例。

We also need to be able to un-escape a similar string we receive in a session variable, so we need the reciprocal function.

我们还需要能够排除在会话变量中接收到的类似字符串,因此我们需要交互函数。

4 个解决方案

#1


2  

Ruby has the built-in URI library, and the Addressable gem, in particular Addressable::URI

Ruby有内置的URI库和可寻址的gem,特别是可寻址的::URI

I prefer Addressable::URI. It's very full featured and handles the encoding for you when you use the query_values= method.

我更喜欢可寻址:URI。它的特性非常全面,当您使用query_values=方法时,它会为您处理编码。

I've seen some discussions about URI going through some growing pains so I tend to leave it alone for handling encoding/escaping until these things get sorted out:

我已经看到一些关于URI的讨论经历了一些成长的痛苦,所以我倾向于把它放在一边处理编码/转义,直到这些事情得到解决:

#2


23  

Don't use URI.escape as it has been deprecated in 1.9.

不要使用uri。escape,因为它已经在1.9中被弃用了。

Rails' Active Support adds Hash#to_query:

Rails主动支持添加散列#to_query:

 {foo: 'asd asdf', bar: '"<#$dfs'}.to_query
 # => "bar=%22%3C%23%24dfs&foo=asd+asdf"

Also, as you can see it tries to order query parameters always the same way, which is good for HTTP caching.

而且,正如您所看到的,它总是尝试以相同的方式对查询参数进行排序,这有利于HTTP缓存。

#3


13  

Ruby Standard Library to the rescue:

Ruby标准库的拯救:

require 'uri'
user_text = URI.escape(user_text)
url = "http://example.com/#{user_text}"
result = open(url).read

See more at the docs for the URI::Escape module. It also has a method to do the inverse (unescape)

查看更多关于URI的文档::Escape模块。它还有一个逆(unescape)的方法

#4


8  

The main thing you have to consider is that you have to escape the keys and values separately before you compose the full URL.

您必须考虑的主要问题是,在组成完整的URL之前,您必须分别转义键和值。

All the methods which get the full URL and try to escape it afterwards are broken, because they cannot tell whether any & or = character was supposed to be a separator, or maybe a part of the value (or part of the key).

所有获取完整URL并试图在之后摆脱它的方法都被破坏了,因为它们无法判断是否有任何& or =字符被认为是一个分隔符,或者可能是值的一部分(或键的一部分)。

The CGI library seems to do a good job, except for the space character, which was traditionally encoded as +, and nowadays should be encoded as %20. But this is an easy fix.

CGI库似乎做得很好,除了空格字符,它传统上被编码为+,现在应该被编码为%20。但这是一个简单的解决办法。

Please, consider the following:

请考虑以下:

require 'cgi'

def encode_component(s)
  # The space-encoding is a problem:
  CGI.escape(s).gsub('+','%20')
end

def url_with_params(path, args = {})
  return path if args.empty?
  path + "?" + args.map do |k,v|
    "#{encode_component(k.to_s)}=#{encode_component(v.to_s)}" 
  end.join("&")
end

def params_from_url(url)
  path,query = url.split('?',2)
  return [path,{}] unless query
  q = query.split('&').inject({}) do |memo,p|
    k,v = p.split('=',2)
    memo[CGI.unescape(k)] = CGI.unescape(v)
    memo
  end
  return [path, q]
end

u = url_with_params( "http://example.com",
                            "x[1]"  => "& ?=/",
                            "2+2=4" => "true" )

# "http://example.com?x%5B1%5D=%26%20%3F%3D%2F&2%2B2%3D4=true"

params_from_url(u)
# ["http://example.com", {"x[1]"=>"& ?=/", "2+2=4"=>"true"}]

#1


2  

Ruby has the built-in URI library, and the Addressable gem, in particular Addressable::URI

Ruby有内置的URI库和可寻址的gem,特别是可寻址的::URI

I prefer Addressable::URI. It's very full featured and handles the encoding for you when you use the query_values= method.

我更喜欢可寻址:URI。它的特性非常全面,当您使用query_values=方法时,它会为您处理编码。

I've seen some discussions about URI going through some growing pains so I tend to leave it alone for handling encoding/escaping until these things get sorted out:

我已经看到一些关于URI的讨论经历了一些成长的痛苦,所以我倾向于把它放在一边处理编码/转义,直到这些事情得到解决:

#2


23  

Don't use URI.escape as it has been deprecated in 1.9.

不要使用uri。escape,因为它已经在1.9中被弃用了。

Rails' Active Support adds Hash#to_query:

Rails主动支持添加散列#to_query:

 {foo: 'asd asdf', bar: '"<#$dfs'}.to_query
 # => "bar=%22%3C%23%24dfs&foo=asd+asdf"

Also, as you can see it tries to order query parameters always the same way, which is good for HTTP caching.

而且,正如您所看到的,它总是尝试以相同的方式对查询参数进行排序,这有利于HTTP缓存。

#3


13  

Ruby Standard Library to the rescue:

Ruby标准库的拯救:

require 'uri'
user_text = URI.escape(user_text)
url = "http://example.com/#{user_text}"
result = open(url).read

See more at the docs for the URI::Escape module. It also has a method to do the inverse (unescape)

查看更多关于URI的文档::Escape模块。它还有一个逆(unescape)的方法

#4


8  

The main thing you have to consider is that you have to escape the keys and values separately before you compose the full URL.

您必须考虑的主要问题是,在组成完整的URL之前,您必须分别转义键和值。

All the methods which get the full URL and try to escape it afterwards are broken, because they cannot tell whether any & or = character was supposed to be a separator, or maybe a part of the value (or part of the key).

所有获取完整URL并试图在之后摆脱它的方法都被破坏了,因为它们无法判断是否有任何& or =字符被认为是一个分隔符,或者可能是值的一部分(或键的一部分)。

The CGI library seems to do a good job, except for the space character, which was traditionally encoded as +, and nowadays should be encoded as %20. But this is an easy fix.

CGI库似乎做得很好,除了空格字符,它传统上被编码为+,现在应该被编码为%20。但这是一个简单的解决办法。

Please, consider the following:

请考虑以下:

require 'cgi'

def encode_component(s)
  # The space-encoding is a problem:
  CGI.escape(s).gsub('+','%20')
end

def url_with_params(path, args = {})
  return path if args.empty?
  path + "?" + args.map do |k,v|
    "#{encode_component(k.to_s)}=#{encode_component(v.to_s)}" 
  end.join("&")
end

def params_from_url(url)
  path,query = url.split('?',2)
  return [path,{}] unless query
  q = query.split('&').inject({}) do |memo,p|
    k,v = p.split('=',2)
    memo[CGI.unescape(k)] = CGI.unescape(v)
    memo
  end
  return [path, q]
end

u = url_with_params( "http://example.com",
                            "x[1]"  => "& ?=/",
                            "2+2=4" => "true" )

# "http://example.com?x%5B1%5D=%26%20%3F%3D%2F&2%2B2%3D4=true"

params_from_url(u)
# ["http://example.com", {"x[1]"=>"& ?=/", "2+2=4"=>"true"}]