如何在URL中的最后一次斜杠后获取所有内容?

时间:2022-10-20 09:54:47

How can I extract whatever follows the last slash in a URL in Python? For example, these URLs should return the following:

如何在Python中的URL中提取最后一个斜杠后面的内容?例如,这些URL应返回以下内容:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

I've tried urlparse, but that gives me the full path filename, such as page/page/12345.

我已经尝试过urlparse,但这给了我完整的路径文件名,例如page / page / 12345。

10 个解决方案

#1


156  

You don't need fancy things, just see the string methods in the standard library and you can easily split your url between 'filename' part and the rest:

你不需要花哨的东西,只需看到标准库中的字符串方法,你就可以轻松地在“文件名”部分和其余部分之间拆分你的网址:

url.rsplit('/', 1)

So you can get the part you're interested in simply with:

因此,您可以通过以下方式获得您感兴趣的部分:

url.rsplit('/', 1)[-1]

#2


45  

One more (idio(ma)tic) way:

一个(idio(ma)tic)方式:

URL.split("/")[-1]

#3


12  

rsplit should be up to the task:

rsplit应该完成任务:

In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'

#4


6  

urlparse is fine to use if you want to (say, to get rid of any query string parameters).

如果你愿意,可以使用urlparse(例如,去掉任何查询字符串参数)。

import urllib.parse

urls = [
    'http://www.test.com/TEST1',
    'http://www.test.com/page/TEST2',
    'http://www.test.com/page/page/12345',
    'http://www.test.com/page/page/12345?abc=123'
]

for i in urls:
    url_parts = urllib.parse.urlparse(i)
    path_parts = url_parts[2].rpartition('/')
    print('URL: {}\nreturns: {}\n'.format(i, path_parts[2]))

Output:

输出:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

URL: http://www.test.com/page/page/12345?abc=123
returns: 12345

#5


4  

You cand do like this:

你这样做:

head, tail = os.path.split(url)

Where tail will be your file name.

tail将成为您的文件名。

#6


2  

extracted_url = url[url.rfind("/")+1:];

#7


0  

partition and rpartition are also handy for such things:

分区和rpartition对于这样的事情也很方便:

url.rpartition('/')[2]

#8


0  

Split the url and pop the last element url.split('/').pop()

拆分网址并弹出最后一个元素url.split('/')。pop()

#9


0  

Here's a more general, regex way of doing this:

这是一个更通用的正则表达方式:

    re.sub(r'^.+/([^/]+)$', r'\1', url)

#10


-1  

url ='http://www.test.com/page/TEST2'.split('/')[4]
print url

Output: TEST2.

输出:TEST2。

#1


156  

You don't need fancy things, just see the string methods in the standard library and you can easily split your url between 'filename' part and the rest:

你不需要花哨的东西,只需看到标准库中的字符串方法,你就可以轻松地在“文件名”部分和其余部分之间拆分你的网址:

url.rsplit('/', 1)

So you can get the part you're interested in simply with:

因此,您可以通过以下方式获得您感兴趣的部分:

url.rsplit('/', 1)[-1]

#2


45  

One more (idio(ma)tic) way:

一个(idio(ma)tic)方式:

URL.split("/")[-1]

#3


12  

rsplit should be up to the task:

rsplit应该完成任务:

In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'

#4


6  

urlparse is fine to use if you want to (say, to get rid of any query string parameters).

如果你愿意,可以使用urlparse(例如,去掉任何查询字符串参数)。

import urllib.parse

urls = [
    'http://www.test.com/TEST1',
    'http://www.test.com/page/TEST2',
    'http://www.test.com/page/page/12345',
    'http://www.test.com/page/page/12345?abc=123'
]

for i in urls:
    url_parts = urllib.parse.urlparse(i)
    path_parts = url_parts[2].rpartition('/')
    print('URL: {}\nreturns: {}\n'.format(i, path_parts[2]))

Output:

输出:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

URL: http://www.test.com/page/page/12345?abc=123
returns: 12345

#5


4  

You cand do like this:

你这样做:

head, tail = os.path.split(url)

Where tail will be your file name.

tail将成为您的文件名。

#6


2  

extracted_url = url[url.rfind("/")+1:];

#7


0  

partition and rpartition are also handy for such things:

分区和rpartition对于这样的事情也很方便:

url.rpartition('/')[2]

#8


0  

Split the url and pop the last element url.split('/').pop()

拆分网址并弹出最后一个元素url.split('/')。pop()

#9


0  

Here's a more general, regex way of doing this:

这是一个更通用的正则表达方式:

    re.sub(r'^.+/([^/]+)$', r'\1', url)

#10


-1  

url ='http://www.test.com/page/TEST2'.split('/')[4]
print url

Output: TEST2.

输出:TEST2。