从字符串groovy获取url

时间:2023-02-05 20:16:44

I am working with a grails app. I need to extract only part of the url up to .com (or gov, edu, mil, org, net, etc.) from a string.

我正在使用grails应用程序。我只需要从字符串中提取到。com(或gov, edu, mil, org, net等)的部分url。

For example:

例如:

Input: https://*.com/questions?=34354#es4 Output: https://*.com/

输入:https://*.com/questions?= 34354 # es4输出:https://*.com/

Input: https://code.google.com/p/crawler4j/issues/detail?id=174 Output: https://code.google.com/

输入:https://code.google.com/p/crawler4j/issues/detail?id = 174输出:https://code.google.com/

Can anyone suggest how it can be done? Also, if it can be done, I need to change https to http in the resulting string. Please help. Thanks.

谁能告诉我怎么做吗?而且,如果可以,我需要在结果字符串中将https更改为http。请帮助。谢谢。

Edit: I apologize to all the downvoters that I did not include the thing that I tried. This is what i tried:

编辑:我向所有失望的选民道歉,我没有把我尝试过的东西包括在内。这就是我所尝试的:

URL url = new URL(website);
String webUrl = url.getprotocol()+"://"+url.getAuthority()

But I got the following error: MissingPropertyException occurred when processing request: [POST] /mypackage/resource/crawl

但是我得到了以下错误:处理请求时发生了MissingPropertyException: [POST] /mypackage/resource/抓取

3 个解决方案

#1


3  

Something like this satisfies the 2 examples given:

类似这样的东西满足两个给定的例子:

def url = new URL('http://*.com/questions?=34354#es4')
def result = 'http://' + url.host +'/'
assert result == 'http://*.com/'

def url2 = new URL('https://code.google.com/p/crawler4j/issues/detail?id=174')
def result2 = 'http://' + url2.host +'/'
assert result2 == 'http://code.google.com/'

EDIT:

编辑:

Of course you can abbreviate the concatenation with something like this:

当然,你可以用以下语句来简化连接:

def url = new URL('http://*.com/questions?=34354#es4')
def result = "http://${url.host}/"
assert result == 'http://*.com/'

def url2 = new URL('https://code.google.com/p/crawler4j/issues/detail?id=174')
def result2 = "http://${url2.host}/"
assert result2 == 'http://code.google.com/'

#2


0  

I found the error in my code as well. I mistyped getProtocol as getprotocol and it evaded my observation again and again. It should have been:

我也发现了代码中的错误。我把getProtocol错写成getProtocol,它一次又一次地避开了我的观察。它应该是:

URL url = new URL(website);
String webUrl = url.getProtocol()+"://"+url.getAuthority()

Thanks everyone for helping.

感谢每个人的帮助。

#3


0  

You can try

你可以试着

​String text = 'http://*.com/questions?=34354#es4'
def parts = text.split('.com')
return parts[0] + ".com"

This should solve your problem

这应该能解决你的问题。

#1


3  

Something like this satisfies the 2 examples given:

类似这样的东西满足两个给定的例子:

def url = new URL('http://*.com/questions?=34354#es4')
def result = 'http://' + url.host +'/'
assert result == 'http://*.com/'

def url2 = new URL('https://code.google.com/p/crawler4j/issues/detail?id=174')
def result2 = 'http://' + url2.host +'/'
assert result2 == 'http://code.google.com/'

EDIT:

编辑:

Of course you can abbreviate the concatenation with something like this:

当然,你可以用以下语句来简化连接:

def url = new URL('http://*.com/questions?=34354#es4')
def result = "http://${url.host}/"
assert result == 'http://*.com/'

def url2 = new URL('https://code.google.com/p/crawler4j/issues/detail?id=174')
def result2 = "http://${url2.host}/"
assert result2 == 'http://code.google.com/'

#2


0  

I found the error in my code as well. I mistyped getProtocol as getprotocol and it evaded my observation again and again. It should have been:

我也发现了代码中的错误。我把getProtocol错写成getProtocol,它一次又一次地避开了我的观察。它应该是:

URL url = new URL(website);
String webUrl = url.getProtocol()+"://"+url.getAuthority()

Thanks everyone for helping.

感谢每个人的帮助。

#3


0  

You can try

你可以试着

​String text = 'http://*.com/questions?=34354#es4'
def parts = text.split('.com')
return parts[0] + ".com"

This should solve your problem

这应该能解决你的问题。