如何对依赖于urllib2的模块进行单元测试?

时间:2022-12-07 18:09:39

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it:

我有一段代码,我无法弄清楚如何进行单元测试!该模块使用urllib2从外部XML提要(twitter,flickr,youtube等)中提取内容。这是一些伪代码:

params = (url, urlencode(data),) if data else (url,)
req = Request(*params)
response = urlopen(req)
#check headers, content-length, etc...
#parse the response XML with lxml...

My first thought was to pickle the response and load it for testing, but apparently urllib's response object is unserializable (it raises an exception).

我的第一个想法是挑选响应并加载它以进行测试,但显然urllib的响应对象是不可序列化的(它引发了异常)。

Just saving the XML from the response body isn't ideal, because my code uses the header information too. It's designed to act on a response object.

仅仅从响应主体保存XML并不理想,因为我的代码也使用了头信息。它旨在作用于响应对象。

And of course, relying on an external source for data in a unit test is a horrible idea.

当然,在单元测试中依赖外部数据来源是一个可怕的想法。

So how do I write a unit test for this?

那么我该如何为此编写单元测试呢?

7 个解决方案

#1


26  

urllib2 has a functions called build_opener() and install_opener() which you should use to mock the behaviour of urlopen()

urllib2有一个名为build_opener()和install_opener()的函数,你应该用它来模拟urlopen()的行为

import urllib2
from StringIO import StringIO

def mock_response(req):
    if req.get_full_url() == "http://example.com":
        resp = urllib2.addinfourl(StringIO("mock file"), "mock message", req.get_full_url())
        resp.code = 200
        resp.msg = "OK"
        return resp

class MyHTTPHandler(urllib2.HTTPHandler):
    def http_open(self, req):
        print "mock opener"
        return mock_response(req)

my_opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener(my_opener)

response=urllib2.urlopen("http://example.com")
print response.read()
print response.code
print response.msg

#2


8  

It would be best if you could write a mock urlopen (and possibly Request) which provides the minimum required interface to behave like urllib2's version. You'd then need to have your function/method which uses it able to accept this mock urlopen somehow, and use urllib2.urlopen otherwise.

最好是你可以编写一个mock urlopen(可能还有Request),它提供了所需的最小接口,就像urllib2的版本一样。然后你需要让你的函数/方法使用它能够以某种方式接受这个模拟urlopen,否则使用urllib2.urlopen。

This is a fair amount of work, but worthwhile. Remember that python is very friendly to ducktyping, so you just need to provide some semblance of the response object's properties to mock it.

这是相当多的工作,但值得。请记住,python对于ducktyping非常友好,所以你只需要提供一些相似的响应对象的属性来模拟它。

For example:

例如:

class MockResponse(object):
    def __init__(self, resp_data, code=200, msg='OK'):
        self.resp_data = resp_data
        self.code = code
        self.msg = msg
        self.headers = {'content-type': 'text/xml; charset=utf-8'}

    def read(self):
        return self.resp_data

    def getcode(self):
        return self.code

    # Define other members and properties you want

def mock_urlopen(request):
    return MockResponse(r'<xml document>')

Granted, some of these are difficult to mock, because for example I believe the normal "headers" is an HTTPMessage which implements fun stuff like case-insensitive header names. But, you might be able to simply construct an HTTPMessage with your response data.

当然,其中一些很难模拟,因为例如我认为正常的“标题”是一个HTTPMessage,它实现了像case-insensitive标题名称这样的有趣的东西。但是,您可以使用响应数据简单地构造HTTPMessage。

#3


5  

Build a separate class or module responsible for communicating with your external feeds.

构建一个单独的类或模块,负责与外部源进行通信。

Make this class able to be a test double. You're using python, so you're pretty golden there; if you were using C#, I'd suggest either in interface or virtual methods.

使这个类能够成为测试的两倍。你正在使用python,所以你在那里很漂亮;如果您使用的是C#,我建议使用接口或虚拟方法。

In your unit test, insert a test double of the external feed class. Test that your code uses the class correctly, assuming that the class does the work of communicating with your external resources correctly. Have your test double return fake data rather than live data; test various combinations of the data and of course the possible exceptions urllib2 could throw.

在单元测试中,插入外部Feed类的测试双。测试您的代码是否正确使用了该类,假设该类正确地与您的外部资源进行通信。让您的测试双重返回假数据而不是实时数据;测试数据的各种组合,当然还有urllib2可能抛出的异常。

Aand... that's it.

Aand ......就是这样。

You can't effectively automate unit tests that rely on external sources, so you're best off not doing it. Run an occasional integration test on your communication module, but don't include those tests as part of your automated tests.

您无法有效地自动化依赖外部源的单元测试,因此您最好不要这样做。在您的通信模块上运行偶尔的集成测试,但不要将这些测试作为自动测试的一部分。

Edit:

编辑:

Just a note on the difference between my answer and @Crast's answer. Both are essentially correct, but they involve different approaches. In Crast's approach, you use a test double on the library itself. In my approach, you abstract the use of the library away into a separate module and test double that module.

请注意我的回答和@Crast的回答之间的区别。两者基本上都是正确的,但它们涉及不同的方法。在Crast的方法中,您在库本身上使用了测试双精度。在我的方法中,您将库的使用抽象为一个单独的模块,并测试该模块的两倍。

Which approach you use is entirely subjective; there's no "correct" answer there. I prefer my approach because it allows me to build more modular, flexible code, something I value. But it comes at a cost in terms of additional code to write, something that may not be valued in many agile situations.

你使用哪种方法完全是主观的;那里没有“正确”的答案。我更喜欢我的方法,因为它允许我构建更多模块化,灵活的代码,这是我重视的。但是,在编写额外代码方面需要付出代价,这在某些敏捷情况下可能无法得到重视。

#4


5  

You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.

您可以使用pymox来模拟urllib2(或任何其他)包中的任何内容和所有内容的行为。这是2010年,你不应该写自己的模拟课程。

#5


1  

I think the easiest thing to do is to actually create a simple web server in your unit test. When you start the test, create a new thread that listens on some arbitrary port and when a client connects just returns a known set of headers and XML, then terminates.

我认为最简单的方法是在单元测试中实际创建一个简单的Web服务器。当您开始测试时,创建一个侦听某个任意端口的新线程,当客户端连接时,只返回一组已知的头和XML,然后终止。

I can elaborate if you need more info.

如果您需要更多信息,我可以详细说明。

Here's some code:

这是一些代码:

import threading, SocketServer, time

# a request handler
class SimpleRequestHandler(SocketServer.BaseRequestHandler):
    def handle(self):
        data = self.request.recv(102400) # token receive
        senddata = file(self.server.datafile).read() # read data from unit test file
        self.request.send(senddata)
        time.sleep(0.1) # make sure it finishes receiving request before closing
        self.request.close()

def serve_data(datafile):
    server = SocketServer.TCPServer(('127.0.0.1', 12345), SimpleRequestHandler)
    server.datafile = datafile
    http_server_thread = threading.Thread(target=server.handle_request())

To run your unit test, call serve_data() then call your code that requests a URL that looks like http://localhost:12345/anythingyouwant.

要运行单元测试,请调用serve_data(),然后调用请求URL的代码,该URL看起来像http:// localhost:12345 / anythingyouwant。

#6


0  

Why not just mock a website that returns the response you expect? then start the server in a thread in setup and kill it in the teardown. I ended up doing this for testing code that would send email by mocking an smtp server and it works great. Surely something more trivial could be done for http...

为什么不模拟一个返回您期望的响应的网站?然后在设置中的线程中启动服务器并在拆卸中将其杀死。我最后这样做是为了测试通过模拟smtp服务器发送电子邮件的代码,它工作得很好。当然可以为http做一些更微不足道的事......

from smtpd import SMTPServer
from time import sleep
import asyncore
SMTP_PORT = 6544

class MockSMTPServer(SMTPServer):
    def __init__(self, localaddr, remoteaddr, cb = None):
        self.cb = cb
        SMTPServer.__init__(self, localaddr, remoteaddr)

    def process_message(self, peer, mailfrom, rcpttos, data):
        print (peer, mailfrom, rcpttos, data)
        if self.cb:
            self.cb(peer, mailfrom, rcpttos, data)
        self.close()

def start_smtp(cb, port=SMTP_PORT):

    def smtp_thread():
        _smtp = MockSMTPServer(("127.0.0.1", port), (None, 0), cb)
        asyncore.loop()
        return Thread(None, smtp_thread)


def test_stuff():
        #.......snip noise
        email_result = None

        def email_back(*args):
            email_result = args

        t = start_smtp(email_back)
        t.start()
        sleep(1)

        res.form["email"]= self.admin_email
        res = res.form.submit()
        assert res.status_int == 302,"should've redirected"


        sleep(1)
        assert email_result is not None, "didn't get an email"

#7


0  

Trying to improve a bit on @john-la-rooy answer, I've made a small class allowing simple mocking for unit tests

试着改进@ john-la-rooy的答案,我做了一个小班,允许简单的模拟单元测试

Should work with python 2 and 3

应该使用python 2和3

try:
    import urllib.request as urllib
except ImportError:
    import urllib2 as urllib

from io import BytesIO


class MockHTTPHandler(urllib.HTTPHandler):

    def mock_response(self, req):
        url = req.get_full_url()

        print("incomming request:", url)

        if url.endswith('.json'):
            resdata = b'[{"hello": "world"}]'
            headers = {'Content-Type': 'application/json'}

            resp = urllib.addinfourl(BytesIO(resdata), header, url, 200)
            resp.msg = "OK"

            return resp
        raise RuntimeError('Unhandled URL', url)
    http_open = mock_response


    @classmethod
    def install(cls):
        previous = urllib._opener
        urllib.install_opener(urllib.build_opener(cls))
        return previous

    @classmethod
    def remove(cls, previous=None):
        urllib.install_opener(previous)

Used like this:

像这样使用:

class TestOther(unittest.TestCase):

    def setUp(self):
        previous = MockHTTPHandler.install()
        self.addCleanup(MockHTTPHandler.remove, previous)

#1


26  

urllib2 has a functions called build_opener() and install_opener() which you should use to mock the behaviour of urlopen()

urllib2有一个名为build_opener()和install_opener()的函数,你应该用它来模拟urlopen()的行为

import urllib2
from StringIO import StringIO

def mock_response(req):
    if req.get_full_url() == "http://example.com":
        resp = urllib2.addinfourl(StringIO("mock file"), "mock message", req.get_full_url())
        resp.code = 200
        resp.msg = "OK"
        return resp

class MyHTTPHandler(urllib2.HTTPHandler):
    def http_open(self, req):
        print "mock opener"
        return mock_response(req)

my_opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener(my_opener)

response=urllib2.urlopen("http://example.com")
print response.read()
print response.code
print response.msg

#2


8  

It would be best if you could write a mock urlopen (and possibly Request) which provides the minimum required interface to behave like urllib2's version. You'd then need to have your function/method which uses it able to accept this mock urlopen somehow, and use urllib2.urlopen otherwise.

最好是你可以编写一个mock urlopen(可能还有Request),它提供了所需的最小接口,就像urllib2的版本一样。然后你需要让你的函数/方法使用它能够以某种方式接受这个模拟urlopen,否则使用urllib2.urlopen。

This is a fair amount of work, but worthwhile. Remember that python is very friendly to ducktyping, so you just need to provide some semblance of the response object's properties to mock it.

这是相当多的工作,但值得。请记住,python对于ducktyping非常友好,所以你只需要提供一些相似的响应对象的属性来模拟它。

For example:

例如:

class MockResponse(object):
    def __init__(self, resp_data, code=200, msg='OK'):
        self.resp_data = resp_data
        self.code = code
        self.msg = msg
        self.headers = {'content-type': 'text/xml; charset=utf-8'}

    def read(self):
        return self.resp_data

    def getcode(self):
        return self.code

    # Define other members and properties you want

def mock_urlopen(request):
    return MockResponse(r'<xml document>')

Granted, some of these are difficult to mock, because for example I believe the normal "headers" is an HTTPMessage which implements fun stuff like case-insensitive header names. But, you might be able to simply construct an HTTPMessage with your response data.

当然,其中一些很难模拟,因为例如我认为正常的“标题”是一个HTTPMessage,它实现了像case-insensitive标题名称这样的有趣的东西。但是,您可以使用响应数据简单地构造HTTPMessage。

#3


5  

Build a separate class or module responsible for communicating with your external feeds.

构建一个单独的类或模块,负责与外部源进行通信。

Make this class able to be a test double. You're using python, so you're pretty golden there; if you were using C#, I'd suggest either in interface or virtual methods.

使这个类能够成为测试的两倍。你正在使用python,所以你在那里很漂亮;如果您使用的是C#,我建议使用接口或虚拟方法。

In your unit test, insert a test double of the external feed class. Test that your code uses the class correctly, assuming that the class does the work of communicating with your external resources correctly. Have your test double return fake data rather than live data; test various combinations of the data and of course the possible exceptions urllib2 could throw.

在单元测试中,插入外部Feed类的测试双。测试您的代码是否正确使用了该类,假设该类正确地与您的外部资源进行通信。让您的测试双重返回假数据而不是实时数据;测试数据的各种组合,当然还有urllib2可能抛出的异常。

Aand... that's it.

Aand ......就是这样。

You can't effectively automate unit tests that rely on external sources, so you're best off not doing it. Run an occasional integration test on your communication module, but don't include those tests as part of your automated tests.

您无法有效地自动化依赖外部源的单元测试,因此您最好不要这样做。在您的通信模块上运行偶尔的集成测试,但不要将这些测试作为自动测试的一部分。

Edit:

编辑:

Just a note on the difference between my answer and @Crast's answer. Both are essentially correct, but they involve different approaches. In Crast's approach, you use a test double on the library itself. In my approach, you abstract the use of the library away into a separate module and test double that module.

请注意我的回答和@Crast的回答之间的区别。两者基本上都是正确的,但它们涉及不同的方法。在Crast的方法中,您在库本身上使用了测试双精度。在我的方法中,您将库的使用抽象为一个单独的模块,并测试该模块的两倍。

Which approach you use is entirely subjective; there's no "correct" answer there. I prefer my approach because it allows me to build more modular, flexible code, something I value. But it comes at a cost in terms of additional code to write, something that may not be valued in many agile situations.

你使用哪种方法完全是主观的;那里没有“正确”的答案。我更喜欢我的方法,因为它允许我构建更多模块化,灵活的代码,这是我重视的。但是,在编写额外代码方面需要付出代价,这在某些敏捷情况下可能无法得到重视。

#4


5  

You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.

您可以使用pymox来模拟urllib2(或任何其他)包中的任何内容和所有内容的行为。这是2010年,你不应该写自己的模拟课程。

#5


1  

I think the easiest thing to do is to actually create a simple web server in your unit test. When you start the test, create a new thread that listens on some arbitrary port and when a client connects just returns a known set of headers and XML, then terminates.

我认为最简单的方法是在单元测试中实际创建一个简单的Web服务器。当您开始测试时,创建一个侦听某个任意端口的新线程,当客户端连接时,只返回一组已知的头和XML,然后终止。

I can elaborate if you need more info.

如果您需要更多信息,我可以详细说明。

Here's some code:

这是一些代码:

import threading, SocketServer, time

# a request handler
class SimpleRequestHandler(SocketServer.BaseRequestHandler):
    def handle(self):
        data = self.request.recv(102400) # token receive
        senddata = file(self.server.datafile).read() # read data from unit test file
        self.request.send(senddata)
        time.sleep(0.1) # make sure it finishes receiving request before closing
        self.request.close()

def serve_data(datafile):
    server = SocketServer.TCPServer(('127.0.0.1', 12345), SimpleRequestHandler)
    server.datafile = datafile
    http_server_thread = threading.Thread(target=server.handle_request())

To run your unit test, call serve_data() then call your code that requests a URL that looks like http://localhost:12345/anythingyouwant.

要运行单元测试,请调用serve_data(),然后调用请求URL的代码,该URL看起来像http:// localhost:12345 / anythingyouwant。

#6


0  

Why not just mock a website that returns the response you expect? then start the server in a thread in setup and kill it in the teardown. I ended up doing this for testing code that would send email by mocking an smtp server and it works great. Surely something more trivial could be done for http...

为什么不模拟一个返回您期望的响应的网站?然后在设置中的线程中启动服务器并在拆卸中将其杀死。我最后这样做是为了测试通过模拟smtp服务器发送电子邮件的代码,它工作得很好。当然可以为http做一些更微不足道的事......

from smtpd import SMTPServer
from time import sleep
import asyncore
SMTP_PORT = 6544

class MockSMTPServer(SMTPServer):
    def __init__(self, localaddr, remoteaddr, cb = None):
        self.cb = cb
        SMTPServer.__init__(self, localaddr, remoteaddr)

    def process_message(self, peer, mailfrom, rcpttos, data):
        print (peer, mailfrom, rcpttos, data)
        if self.cb:
            self.cb(peer, mailfrom, rcpttos, data)
        self.close()

def start_smtp(cb, port=SMTP_PORT):

    def smtp_thread():
        _smtp = MockSMTPServer(("127.0.0.1", port), (None, 0), cb)
        asyncore.loop()
        return Thread(None, smtp_thread)


def test_stuff():
        #.......snip noise
        email_result = None

        def email_back(*args):
            email_result = args

        t = start_smtp(email_back)
        t.start()
        sleep(1)

        res.form["email"]= self.admin_email
        res = res.form.submit()
        assert res.status_int == 302,"should've redirected"


        sleep(1)
        assert email_result is not None, "didn't get an email"

#7


0  

Trying to improve a bit on @john-la-rooy answer, I've made a small class allowing simple mocking for unit tests

试着改进@ john-la-rooy的答案,我做了一个小班,允许简单的模拟单元测试

Should work with python 2 and 3

应该使用python 2和3

try:
    import urllib.request as urllib
except ImportError:
    import urllib2 as urllib

from io import BytesIO


class MockHTTPHandler(urllib.HTTPHandler):

    def mock_response(self, req):
        url = req.get_full_url()

        print("incomming request:", url)

        if url.endswith('.json'):
            resdata = b'[{"hello": "world"}]'
            headers = {'Content-Type': 'application/json'}

            resp = urllib.addinfourl(BytesIO(resdata), header, url, 200)
            resp.msg = "OK"

            return resp
        raise RuntimeError('Unhandled URL', url)
    http_open = mock_response


    @classmethod
    def install(cls):
        previous = urllib._opener
        urllib.install_opener(urllib.build_opener(cls))
        return previous

    @classmethod
    def remove(cls, previous=None):
        urllib.install_opener(previous)

Used like this:

像这样使用:

class TestOther(unittest.TestCase):

    def setUp(self):
        previous = MockHTTPHandler.install()
        self.addCleanup(MockHTTPHandler.remove, previous)