“eventlet。衍生产品不能像预期的那样工作。

时间:2022-04-05 12:57:11

I'm writing a web UI for data analysis tasks.

我正在为数据分析任务编写一个web UI。

Here's the way it's supposed to work:

它应该是这样工作的:

After a user specifies parameters like dataset and learning rate, I create a new task record, then a executor for this task is started asyncly (The executor may take a long time to run.), and the user is redirected to some other page.

在用户指定数据集和学习速率等参数后,我创建一个新的任务记录,然后异步启动此任务的执行程序(执行程序可能需要很长时间才能运行),然后将用户重定向到其他页面。

After searching for an async library for python, I started with eventlet, here's what I wrote in a flask view function:

在为python搜索异步库之后,我从eventlet开始,以下是我在flask视图函数中编写的内容:

db.save(task)
eventlet.spawn(executor, task)
return redirect("/show_tasks")

With the code above, the executor didn't execute at all.

对于上面的代码,执行程序根本没有执行。

What may be the problem of my code? Or maybe I should try something else?

我的代码有什么问题吗?或者我应该试试别的?

3 个解决方案

#1


3  

You'll need to patch some system libraries in order to make eventlet work. Here is a minimal working example (also as gist):

为了让eventlet正常工作,您需要对一些系统库进行补丁。这里有一个最小的工作示例(也作为要点):

#!/usr/bin/env python 

from flask import Flask 
import time 
import eventlet 

eventlet.monkey_patch() 

app = Flask(__name__) 
app.debug = True 

def background(): 
    """ do something in the background """ 
    print('[background] working in the background...') 
    time.sleep(2) 
    print('[background] done.') 
    return 42 

def callback(gt, *args, **kwargs): 
    """ this function is called when results are available """ 
    result = gt.wait() 
    print("[cb] %s" % result) 

@app.route('/') 
def index(): 
    greenth = eventlet.spawn(background) 
    greenth.link(callback) 
    return "Hello World" 

if __name__ == '__main__': 
    app.run() 

More on that:

更多详情:

One of the challenges of writing a library like Eventlet is that the built-in networking libraries don’t natively support the sort of cooperative yielding that we need.

编写像Eventlet这样的库的挑战之一是,内置的网络库不支持我们需要的那种协作。

#2


4  

While you been given with direct solutions, i will try to answer your first question and explain why your code does not work as expected.

当您得到直接解决方案时,我将尝试回答您的第一个问题,并解释为什么您的代码不能正常工作。

Disclosures: i currently maintain Eventlet. This comment will contain a number of simplifications to fit into reasonable size.

披露:我目前维护Eventlet。此注释将包含一些简化,以适应合理的大小。

Brief introduction to cooperative multithreading

There are two ways to do Multithreading and Eventlet exploits cooperative approach. At the core is Greenlet library which basically allows you to create independent "execution contexts". One could think of such context as frozen state of all local variables and a pointer to next instruction. Basically, multithreading = contexts + scheduler. Greenlet provides contexts so we need a scheduler, something that makes decisions about which context should occupy CPU right now. It turns, to make decisions we should also run some code. Which means a separate context (green thread). This special green thread is called a Hub in Eventlet code base. Scheduler maintains an ordered set of contexts that need to be run ASAP - run queue and set of contexts that are waiting for something (e.g. network IO or time limited sleep) to finish.

有两种方法可以实现多线程和Eventlet使用协作方法。其核心是Greenlet库,它基本上允许您创建独立的“执行上下文”。可以把这种上下文看作是所有局部变量的冻结状态,以及指向下一条指令的指针。基本上,多线程=上下文+调度器。Greenlet提供了上下文,所以我们需要一个调度器,它可以决定哪个上下文现在应该占用CPU。我们还应该运行一些代码。这意味着一个单独的上下文(绿线)。这个特殊的绿色线程在Eventlet代码库中称为Hub。调度器维护一组需要尽快运行的有序上下文—运行队列和一组等待完成某事(例如网络IO或时间有限的睡眠)的上下文。

But since we are doing cooperative multitasking, one context will execute indefinitely unless it explicitly yields to another. This would be very sad style of programming, and also by definition incompatible with existing libraries (pointing at they-know-who); so what Eventlet does is it provides green versions of common modules, changed in such way that they switch to Hub instead of blocking everything. Then, some time may be spent in other green threads or in Hub's wait-for-external-events implementation, in which case Hub would switch back to green thread originating that event - and it would continue execution.

但是,由于我们正在进行合作的多任务处理,一个上下文将无限期地执行,除非它显式地向另一个上下文让步。这将是一种非常令人悲哀的编程风格,而且根据定义与现有的库不兼容(指向“神秘人”);Eventlet所做的就是提供绿色的通用模块版本,改变的方式是它们切换到Hub而不是阻塞一切。然后,可以在其他绿色线程或Hub的“等待外部事件”实现中花费一些时间,在这种情况下,Hub将切换回发起该事件的绿色线程——它将继续执行。

End. Now back to your problem.

结束。现在回到你的问题。


What eventlet.spawn actually does: it creates a new execution context. Basically, allocates an object in memory. Also it tells scheduler to put this context into run queue, so at first possible moment, Hub will switch to newly spawned function. Your code does not provide such a moment. There is no place where you explicitly give up execution to other green threads, for Eventlet this is usually done via eventlet.sleep(). And since you don't use green versions of common modules, there is no chance to yield implicitly when other code waits. Most appropriate (if not the only one) place would be your WSGI server's accept loop: it should give other green threads chance to run while waiting for next request. Mentioned in first answer eventlet.monkey_patch() is just a convenient way to replace all (or subset of) common modules with their corresponding green versions.

eventlet什么。派生实际上做了:它创建了一个新的执行上下文。基本上,在内存中分配一个对象。它还告诉调度器将此上下文放入运行队列中,因此在第一时间,Hub将切换到新生成的函数。您的代码没有提供这样的时刻。没有任何地方可以显式地放弃对其他绿色线程的执行,因为Eventlet通常是通过Eventlet .sleep()实现的。而且由于您不使用公共模块的绿色版本,所以当其他代码等待时,不可能隐式地产生结果。最合适的(如果不是唯一的)位置是您的WSGI服务器的接受循环:它应该给其他绿色线程在等待下一个请求时运行的机会。在first answer event .monkey_patch()中提到,它只是一种方便的方法,可以用相应的绿色版本替换所有(或子集)公共模块。


Unwanted opinion on overall design In separate section, to skip easily. Iff you are building error resistant software, you usually want to limit execution time for spawned threads (including but not limited to "green") and processes and at least report(log) or react to their unhandled errors. In provided code, your spawned green thread, technically may run in next moment or five minutes later (again, because nobody yields CPU) or fail with unhandled exception. Luckily, Eventlet provides two solutions for both problems: Timeout with_timeout() allow to limit waiting time (remember, if it does not yield, you can't possibly limit it) and GreenThread.link() to catch all exceptions. It may be tempting (it was for me) to reraise exceptions in "main" code, and link() allows that easily, but consider that exceptions would be raised from sleep and IO calls - places where you yield to Hub. This may provide some really counter intuitive tracebacks.

对整体设计的意见不要放在单独的部分,容易跳过。如果您正在构建抗错误软件,您通常希望限制衍生线程(包括但不限于“绿色”)和进程的执行时间,并至少报告(日志)或对它们未处理的错误作出反应。在提供的代码中,您所生成的绿色线程,技术上可能会在接下来的一分钟或五分钟后运行(同样,因为没有人产生CPU),或者在未处理的异常情况下失败。幸运的是,Eventlet为这两个问题提供了两种解决方案:Timeout with_timeout()允许限制等待时间(请记住,如果它不屈服,您不可能限制它)和GreenThread.link()来捕获所有异常。在“主”代码中重新运行异常对我来说可能很诱人,而link()允许这样做,但是考虑到异常将从休眠和IO调用中产生——您将屈服于Hub的地方。这可能会提供一些与直觉相反的回溯。

#3


1  

Eventlet may indeed be suitable for your purposes, but it doesn't just fit in with any old application; Eventlet requires that it be in control of all your application's I/O.

Eventlet可能确实适合您的目的,但它并不仅仅适用于任何旧的应用程序;Eventlet要求它控制应用程序的所有I/O。

You may be able to get away with either

你可能会侥幸逃脱

  1. Starting Eventlet's main loop in another thread, or even

    在另一个线程中启动Eventlet的主循环

  2. Not using Eventlet and just spawning your task in another thread.

    不使用Eventlet,只在另一个线程中生成任务。

Celery may be another option.

芹菜可能是另一种选择。

#1


3  

You'll need to patch some system libraries in order to make eventlet work. Here is a minimal working example (also as gist):

为了让eventlet正常工作,您需要对一些系统库进行补丁。这里有一个最小的工作示例(也作为要点):

#!/usr/bin/env python 

from flask import Flask 
import time 
import eventlet 

eventlet.monkey_patch() 

app = Flask(__name__) 
app.debug = True 

def background(): 
    """ do something in the background """ 
    print('[background] working in the background...') 
    time.sleep(2) 
    print('[background] done.') 
    return 42 

def callback(gt, *args, **kwargs): 
    """ this function is called when results are available """ 
    result = gt.wait() 
    print("[cb] %s" % result) 

@app.route('/') 
def index(): 
    greenth = eventlet.spawn(background) 
    greenth.link(callback) 
    return "Hello World" 

if __name__ == '__main__': 
    app.run() 

More on that:

更多详情:

One of the challenges of writing a library like Eventlet is that the built-in networking libraries don’t natively support the sort of cooperative yielding that we need.

编写像Eventlet这样的库的挑战之一是,内置的网络库不支持我们需要的那种协作。

#2


4  

While you been given with direct solutions, i will try to answer your first question and explain why your code does not work as expected.

当您得到直接解决方案时,我将尝试回答您的第一个问题,并解释为什么您的代码不能正常工作。

Disclosures: i currently maintain Eventlet. This comment will contain a number of simplifications to fit into reasonable size.

披露:我目前维护Eventlet。此注释将包含一些简化,以适应合理的大小。

Brief introduction to cooperative multithreading

There are two ways to do Multithreading and Eventlet exploits cooperative approach. At the core is Greenlet library which basically allows you to create independent "execution contexts". One could think of such context as frozen state of all local variables and a pointer to next instruction. Basically, multithreading = contexts + scheduler. Greenlet provides contexts so we need a scheduler, something that makes decisions about which context should occupy CPU right now. It turns, to make decisions we should also run some code. Which means a separate context (green thread). This special green thread is called a Hub in Eventlet code base. Scheduler maintains an ordered set of contexts that need to be run ASAP - run queue and set of contexts that are waiting for something (e.g. network IO or time limited sleep) to finish.

有两种方法可以实现多线程和Eventlet使用协作方法。其核心是Greenlet库,它基本上允许您创建独立的“执行上下文”。可以把这种上下文看作是所有局部变量的冻结状态,以及指向下一条指令的指针。基本上,多线程=上下文+调度器。Greenlet提供了上下文,所以我们需要一个调度器,它可以决定哪个上下文现在应该占用CPU。我们还应该运行一些代码。这意味着一个单独的上下文(绿线)。这个特殊的绿色线程在Eventlet代码库中称为Hub。调度器维护一组需要尽快运行的有序上下文—运行队列和一组等待完成某事(例如网络IO或时间有限的睡眠)的上下文。

But since we are doing cooperative multitasking, one context will execute indefinitely unless it explicitly yields to another. This would be very sad style of programming, and also by definition incompatible with existing libraries (pointing at they-know-who); so what Eventlet does is it provides green versions of common modules, changed in such way that they switch to Hub instead of blocking everything. Then, some time may be spent in other green threads or in Hub's wait-for-external-events implementation, in which case Hub would switch back to green thread originating that event - and it would continue execution.

但是,由于我们正在进行合作的多任务处理,一个上下文将无限期地执行,除非它显式地向另一个上下文让步。这将是一种非常令人悲哀的编程风格,而且根据定义与现有的库不兼容(指向“神秘人”);Eventlet所做的就是提供绿色的通用模块版本,改变的方式是它们切换到Hub而不是阻塞一切。然后,可以在其他绿色线程或Hub的“等待外部事件”实现中花费一些时间,在这种情况下,Hub将切换回发起该事件的绿色线程——它将继续执行。

End. Now back to your problem.

结束。现在回到你的问题。


What eventlet.spawn actually does: it creates a new execution context. Basically, allocates an object in memory. Also it tells scheduler to put this context into run queue, so at first possible moment, Hub will switch to newly spawned function. Your code does not provide such a moment. There is no place where you explicitly give up execution to other green threads, for Eventlet this is usually done via eventlet.sleep(). And since you don't use green versions of common modules, there is no chance to yield implicitly when other code waits. Most appropriate (if not the only one) place would be your WSGI server's accept loop: it should give other green threads chance to run while waiting for next request. Mentioned in first answer eventlet.monkey_patch() is just a convenient way to replace all (or subset of) common modules with their corresponding green versions.

eventlet什么。派生实际上做了:它创建了一个新的执行上下文。基本上,在内存中分配一个对象。它还告诉调度器将此上下文放入运行队列中,因此在第一时间,Hub将切换到新生成的函数。您的代码没有提供这样的时刻。没有任何地方可以显式地放弃对其他绿色线程的执行,因为Eventlet通常是通过Eventlet .sleep()实现的。而且由于您不使用公共模块的绿色版本,所以当其他代码等待时,不可能隐式地产生结果。最合适的(如果不是唯一的)位置是您的WSGI服务器的接受循环:它应该给其他绿色线程在等待下一个请求时运行的机会。在first answer event .monkey_patch()中提到,它只是一种方便的方法,可以用相应的绿色版本替换所有(或子集)公共模块。


Unwanted opinion on overall design In separate section, to skip easily. Iff you are building error resistant software, you usually want to limit execution time for spawned threads (including but not limited to "green") and processes and at least report(log) or react to their unhandled errors. In provided code, your spawned green thread, technically may run in next moment or five minutes later (again, because nobody yields CPU) or fail with unhandled exception. Luckily, Eventlet provides two solutions for both problems: Timeout with_timeout() allow to limit waiting time (remember, if it does not yield, you can't possibly limit it) and GreenThread.link() to catch all exceptions. It may be tempting (it was for me) to reraise exceptions in "main" code, and link() allows that easily, but consider that exceptions would be raised from sleep and IO calls - places where you yield to Hub. This may provide some really counter intuitive tracebacks.

对整体设计的意见不要放在单独的部分,容易跳过。如果您正在构建抗错误软件,您通常希望限制衍生线程(包括但不限于“绿色”)和进程的执行时间,并至少报告(日志)或对它们未处理的错误作出反应。在提供的代码中,您所生成的绿色线程,技术上可能会在接下来的一分钟或五分钟后运行(同样,因为没有人产生CPU),或者在未处理的异常情况下失败。幸运的是,Eventlet为这两个问题提供了两种解决方案:Timeout with_timeout()允许限制等待时间(请记住,如果它不屈服,您不可能限制它)和GreenThread.link()来捕获所有异常。在“主”代码中重新运行异常对我来说可能很诱人,而link()允许这样做,但是考虑到异常将从休眠和IO调用中产生——您将屈服于Hub的地方。这可能会提供一些与直觉相反的回溯。

#3


1  

Eventlet may indeed be suitable for your purposes, but it doesn't just fit in with any old application; Eventlet requires that it be in control of all your application's I/O.

Eventlet可能确实适合您的目的,但它并不仅仅适用于任何旧的应用程序;Eventlet要求它控制应用程序的所有I/O。

You may be able to get away with either

你可能会侥幸逃脱

  1. Starting Eventlet's main loop in another thread, or even

    在另一个线程中启动Eventlet的主循环

  2. Not using Eventlet and just spawning your task in another thread.

    不使用Eventlet,只在另一个线程中生成任务。

Celery may be another option.

芹菜可能是另一种选择。