Ruby regex:从字符串中提取url列表

时间:2022-09-13 11:06:06

I have a string of images' URLs and I need to convert it into an array.

我有一串图像的url,我需要把它转换成一个数组。

http://rubular.com/r/E2a5v2hYnJ

http://rubular.com/r/E2a5v2hYnJ

How do I do this?

我该怎么做呢?

5 个解决方案

#1


1  

The best answer will depend very much on exactly what input string you expect.

最好的答案将很大程度上取决于您期望的输入字符串。

If your test string is accurate then I would not use a regex, do this instead (as suggested by Marnen Laibow-Koser):

如果您的测试字符串是准确的,那么我不会使用regex,而是这样做(正如Marnen Laibow-Koser建议的):

mystring.split('?v=3')

If you really don't have constant fluff between your useful strings then regex might be better. Your regex is greedy. This will get you part way:

如果您在使用的字符串之间没有固定的错误,那么regex可能会更好。你的正则表达式是贪婪。这将使你的部分道路:

mystring.scan(/https?:\/\/[\w.-\/]*?\.(jpe?g|gif|png)/)

Note the '?' after the '*' in the part capturing the server and path pieces of the URL, this makes the regex non-greedy.

注意”?在捕获URL的服务器和路径部分的'*'之后,这使regex变得不贪婪。

The problem with this is that if your server name or path contains any of .jpg, .jpeg, .gif or .png then the result will be wrong in that instance.

这样做的问题是,如果您的服务器名或路径包含任何.jpg、.jpeg、.gif或.png,那么该实例的结果将是错误的。

Figuring out what is best needs more information about your input string. You might for example find it better to pattern match the fluff between your desired URLs.

确定什么是最好的需要更多关于输入字符串的信息。例如,您可能会发现,在您想要的url之间匹配fluff更好。

#2


5  

URI.extract(your_string)

That's all you need if you already have it in a string. I can't remember, but you may have to put require 'uri' in there first. Gotta love that standard library!

如果你已经有了字符串,这就是你所需要的。我不记得了,但是你可能得先把require 'uri'放在里面。我喜欢那个标准的图书馆!

Here's the link to the docs URI#extract

这是文档URI#摘录的链接

#3


4  

Scan returns an array

扫描返回一个数组

myarray = mystring.scan(/regex/)

See here on regular-expressions.info

看到这里regular-expressions.info

#4


1  

Use String#split (see the docs for details).

使用字符串#split(详情请参阅文档)。

#5


-1  

Part of the problem is in rubular you are using https instead of http.. this gets you closer to what you want if the other answers don't work for you:

部分问题是在ru你使用https而不是http..如果其他答案对你不起作用,这将使你更接近你想要的:

http://rubular.com/r/cIjmjxIfz5

http://rubular.com/r/cIjmjxIfz5

#1


1  

The best answer will depend very much on exactly what input string you expect.

最好的答案将很大程度上取决于您期望的输入字符串。

If your test string is accurate then I would not use a regex, do this instead (as suggested by Marnen Laibow-Koser):

如果您的测试字符串是准确的,那么我不会使用regex,而是这样做(正如Marnen Laibow-Koser建议的):

mystring.split('?v=3')

If you really don't have constant fluff between your useful strings then regex might be better. Your regex is greedy. This will get you part way:

如果您在使用的字符串之间没有固定的错误,那么regex可能会更好。你的正则表达式是贪婪。这将使你的部分道路:

mystring.scan(/https?:\/\/[\w.-\/]*?\.(jpe?g|gif|png)/)

Note the '?' after the '*' in the part capturing the server and path pieces of the URL, this makes the regex non-greedy.

注意”?在捕获URL的服务器和路径部分的'*'之后,这使regex变得不贪婪。

The problem with this is that if your server name or path contains any of .jpg, .jpeg, .gif or .png then the result will be wrong in that instance.

这样做的问题是,如果您的服务器名或路径包含任何.jpg、.jpeg、.gif或.png,那么该实例的结果将是错误的。

Figuring out what is best needs more information about your input string. You might for example find it better to pattern match the fluff between your desired URLs.

确定什么是最好的需要更多关于输入字符串的信息。例如,您可能会发现,在您想要的url之间匹配fluff更好。

#2


5  

URI.extract(your_string)

That's all you need if you already have it in a string. I can't remember, but you may have to put require 'uri' in there first. Gotta love that standard library!

如果你已经有了字符串,这就是你所需要的。我不记得了,但是你可能得先把require 'uri'放在里面。我喜欢那个标准的图书馆!

Here's the link to the docs URI#extract

这是文档URI#摘录的链接

#3


4  

Scan returns an array

扫描返回一个数组

myarray = mystring.scan(/regex/)

See here on regular-expressions.info

看到这里regular-expressions.info

#4


1  

Use String#split (see the docs for details).

使用字符串#split(详情请参阅文档)。

#5


-1  

Part of the problem is in rubular you are using https instead of http.. this gets you closer to what you want if the other answers don't work for you:

部分问题是在ru你使用https而不是http..如果其他答案对你不起作用,这将使你更接近你想要的:

http://rubular.com/r/cIjmjxIfz5

http://rubular.com/r/cIjmjxIfz5