Ruby regex:从字符串中提取url列表

I have a string of images' URLs and I need to convert it into an array.

我有一串图像的url，我需要把它转换成一个数组。

http://rubular.com/r/E2a5v2hYnJ

How do I do this?

我该怎么做呢?

5 个解决方案

#1

The best answer will depend very much on exactly what input string you expect.

最好的答案将很大程度上取决于您期望的输入字符串。

If your test string is accurate then I would not use a regex, do this instead (as suggested by Marnen Laibow-Koser):

如果您的测试字符串是准确的，那么我不会使用regex，而是这样做(正如Marnen Laibow-Koser建议的):

mystring.split('?v=3')

If you really don't have constant fluff between your useful strings then regex might be better. Your regex is greedy. This will get you part way:

如果您在使用的字符串之间没有固定的错误，那么regex可能会更好。你的正则表达式是贪婪。这将使你的部分道路:

mystring.scan(/https?:\/\/[\w.-\/]*?\.(jpe?g|gif|png)/)

Note the '?' after the '*' in the part capturing the server and path pieces of the URL, this makes the regex non-greedy.

注意”?在捕获URL的服务器和路径部分的'*'之后，这使regex变得不贪婪。

The problem with this is that if your server name or path contains any of .jpg, .jpeg, .gif or .png then the result will be wrong in that instance.

这样做的问题是，如果您的服务器名或路径包含任何.jpg、.jpeg、.gif或.png，那么该实例的结果将是错误的。

Figuring out what is best needs more information about your input string. You might for example find it better to pattern match the fluff between your desired URLs.

确定什么是最好的需要更多关于输入字符串的信息。例如，您可能会发现，在您想要的url之间匹配fluff更好。

#2

URI.extract(your_string)

That's all you need if you already have it in a string. I can't remember, but you may have to put require 'uri' in there first. Gotta love that standard library!

如果你已经有了字符串，这就是你所需要的。我不记得了，但是你可能得先把require 'uri'放在里面。我喜欢那个标准的图书馆!

Here's the link to the docs URI#extract

这是文档URI#摘录的链接

#3

Scan returns an array

扫描返回一个数组

myarray = mystring.scan(/regex/)

See here on regular-expressions.info

看到这里regular-expressions.info

#4

Use String#split (see the docs for details).

使用字符串#split(详情请参阅文档)。

#5

-1

Part of the problem is in rubular you are using https instead of http.. this gets you closer to what you want if the other answers don't work for you:

部分问题是在ru你使用https而不是http..如果其他答案对你不起作用，这将使你更接近你想要的:

http://rubular.com/r/cIjmjxIfz5

#1