如何使用Python读取URL的内容?

时间:2022-10-31 13:16:51

The following works when I paste it on the browser:

当我将它粘贴到浏览器上时,如下所示:

http://www.somesite.com/details.pl?urn=2344

But when I try reading the URL with Python nothing happens:

但是当我尝试用Python读取URL时,什么也没有发生:

 link = 'http://www.somesite.com/details.pl?urn=2344'
 f = urllib.urlopen(link)           
 myfile = f.readline()  
 print myfile

Do I need to encode the URL, or is there something I'm not seeing?

我需要对URL进行编码,还是有什么东西我没有看到?

6 个解决方案

#1


91  

To answer your question:

回答你的问题:

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print myfile

You need to read(), not readline()

您需要read(),而不是readline()

EDIT (2018-06-25): Since Python 3, the legacy urllib.urlopen() was replaced by urllib.request.urlopen() (see notes from https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen for details).

编辑(2018-06-25):自从Python 3以来,遗留的urllib.urlopen()被urllib.request.urlopen()所取代(详细信息请参见https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen)。

If you're using Python 3, see answers by Martin Thoma or i.n.n.m within this question: https://*.com/a/28040508/158111 (Python 2/3 compat) https://*.com/a/45886824/158111 (Python 3)

如果您正在使用Python 3,请参见Martin Thoma或i.n.n的答案。m在这个问题中:https://*.com/a/280408/158111 (Python 2/3 compat) https://*.com/a/45886824/158111 (Python 3)

Or, just get this library here: http://docs.python-requests.org/en/latest/ and seriously use it :)

或者,在这里获取这个库:http://docs.python-requests.org/en/latest/并认真使用它:)

import requests

link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)

print f.text

#2


8  

A solution with works with Python 2.X and Python 3.X makes use of the Python 2 and 3 compatibility library six:

使用Python 2的解决方案。X和Python 3。X使用Python 2和3兼容库6:

from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)

#3


2  

For python3 users, to save time, use the following code,

对于python3用户,为了节省时间,使用以下代码,

from urllib.request import urlopen

link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"

f = urlopen(link)
myfile = f.read()
print (myfile)

I know there are different threads for error: Name Error: urlopen is not defined, but thought this might save time.

我知道错误有不同的线程:Name error: urlopen没有定义,但是我认为这样可以节省时间。

#4


0  

The URL should be a string:

URL应该是一个字符串:

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)           
myfile = f.readline()  
print myfile

#5


0  

I used the following code:

我使用了以下代码:

import urllib

def read_text():
      quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
      contents_file = quotes.read()
      print contents_file

read_text()

#6


0  

We can read website html content as below :

我们可以阅读网站的html内容如下:

from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)

#1


91  

To answer your question:

回答你的问题:

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print myfile

You need to read(), not readline()

您需要read(),而不是readline()

EDIT (2018-06-25): Since Python 3, the legacy urllib.urlopen() was replaced by urllib.request.urlopen() (see notes from https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen for details).

编辑(2018-06-25):自从Python 3以来,遗留的urllib.urlopen()被urllib.request.urlopen()所取代(详细信息请参见https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen)。

If you're using Python 3, see answers by Martin Thoma or i.n.n.m within this question: https://*.com/a/28040508/158111 (Python 2/3 compat) https://*.com/a/45886824/158111 (Python 3)

如果您正在使用Python 3,请参见Martin Thoma或i.n.n的答案。m在这个问题中:https://*.com/a/280408/158111 (Python 2/3 compat) https://*.com/a/45886824/158111 (Python 3)

Or, just get this library here: http://docs.python-requests.org/en/latest/ and seriously use it :)

或者,在这里获取这个库:http://docs.python-requests.org/en/latest/并认真使用它:)

import requests

link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)

print f.text

#2


8  

A solution with works with Python 2.X and Python 3.X makes use of the Python 2 and 3 compatibility library six:

使用Python 2的解决方案。X和Python 3。X使用Python 2和3兼容库6:

from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)

#3


2  

For python3 users, to save time, use the following code,

对于python3用户,为了节省时间,使用以下代码,

from urllib.request import urlopen

link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"

f = urlopen(link)
myfile = f.read()
print (myfile)

I know there are different threads for error: Name Error: urlopen is not defined, but thought this might save time.

我知道错误有不同的线程:Name error: urlopen没有定义,但是我认为这样可以节省时间。

#4


0  

The URL should be a string:

URL应该是一个字符串:

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)           
myfile = f.readline()  
print myfile

#5


0  

I used the following code:

我使用了以下代码:

import urllib

def read_text():
      quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
      contents_file = quotes.read()
      print contents_file

read_text()

#6


0  

We can read website html content as below :

我们可以阅读网站的html内容如下:

from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)