节点。js和CPU密集型请求

时间:2022-01-05 14:58:50

I've started tinkering with Node.js HTTP server and really like to write server side Javascript but something is keeping me from starting to use Node.js for my web application.

我已经开始修改Node了。js HTTP服务器,非常喜欢编写服务器端Javascript,但是有些事情让我无法开始使用Node。我的web应用程序的js。

I understand the whole async I/O concept but I'm somewhat concerned about the edge cases where procedural code is very CPU intensive such as image manipulation or sorting large data sets.

我理解整个异步I/O概念,但我有点担心程序代码占用大量CPU的边缘情况,比如图像处理或对大型数据集进行排序。

As I understand it, the server will be very fast for simple web page requests such as viewing a listing of users or viewing a blog post. However, if I want to write very CPU intensive code (in the admin back end for example) that generates graphics or resizes thousands of images, the request will be very slow (a few seconds). Since this code is not async, every requests coming to the server during those few seconds will be blocked until my slow request is done.

据我所知,对于简单的web页面请求,如查看用户列表或查看博客文章,服务器将非常快速。但是,如果我想编写非常CPU密集型的代码(例如在管理后端),生成图形或调整数千个图像的大小,请求将非常缓慢(几秒钟)。由于这段代码不是异步的,所以在这几秒钟内到达服务器的每个请求都将被阻塞,直到我缓慢的请求完成为止。

One suggestion was to use Web Workers for CPU intensive tasks. However, I'm afraid web workers will make it hard to write clean code since it works by including a separate JS file. What if the CPU intensive code is located in an object's method? It kind of sucks to write a JS file for every method that is CPU intensive.

其中一个建议是使用Web worker进行CPU密集型任务。但是,我担心web工作人员将很难编写干净的代码,因为它通过包含一个单独的JS文件来工作。如果CPU密集型代码位于对象的方法中,该怎么办?为每一个CPU密集型的方法编写一个JS文件有点糟糕。

Another suggestion was to spawn a child process, but that makes the code even less maintainable.

另一个建议是生成子进程,但这使得代码更难以维护。

Any suggestions to overcome this (perceived) obstacle? How do you write clean object oriented code with Node.js while making sure CPU heavy tasks are executed async?

有什么建议可以克服这个(感知到的)障碍吗?如何用Node编写面向对象的代码。在确保异步执行CPU密集型任务的同时?

5 个解决方案

#1


45  

What you need is a task queue! Moving your long running tasks out of the web-server is a GOOD thing. Keeping each task in "separate" js file promotes modularity and code reuse. It forces you to think about how to structure your program in a way that will make it easier to debug and maintain in the long run. Another benefit of a task queue is the workers can be written in a different language. Just pop a task, do the work, and write the response back.

您需要的是一个任务队列!将长时间运行的任务移出web服务器是一件好事。将每个任务保存在“单独”的js文件中可以促进模块化和代码重用。它迫使您考虑如何以一种更容易调试和长期维护的方式来构建程序。任务队列的另一个好处是可以用不同的语言编写worker。只需弹出一个任务,完成工作,并将响应写回。

something like this https://github.com/resque/resque

这样的https://github.com/resque/resque

Here is an article from github about why they built it http://github.com/blog/542-introducing-resque

下面是github上的一篇文章,讲述了他们为什么要创建这个网站http://github.com/blog/542- introduction resque

#2


267  

This is misunderstanding of the definition of web server -- it should only be used to "talk" with clients. Heavy load tasks should be delegated to standalone programs (that of course can be also written in JS).
You'd probably say that it is dirty, but I assure you that a web server process stuck in resizing images is just worse (even for lets say Apache, when it does not block other queries). Still, you may use a common library to avoid code redundancy.

这是对web服务器的定义的误解——它应该只用于与客户端“交谈”。重载任务应该委托给独立的程序(当然也可以用JS编写)。您可能会说它是脏的,但是我向您保证,在调整图像大小的web服务器进程是更糟糕的(即使是对Apache来说,当它不阻塞其他查询时)。不过,您可以使用公共库来避免代码冗余。

EDIT: I have come up with an analogy; web application should be as a restaurant. You have waiters (web server) and cooks (workers). Waiters are in contact with clients and do simple tasks like providing menu or explaining if some dish is vegetarian. On the other hand they delegate harder tasks to the kitchen. Because waiters are doing only simple things they respond quick, and cooks can concentrate on their job.

编辑:我想到了一个类比;web应用程序应该是餐厅。你有服务员(web服务器)和厨师(员工)。服务员与顾客保持联系,做一些简单的工作,如提供菜单或解释某些菜是否是素食。另一方面,他们把较难的任务分配给厨房。因为服务员只做简单的事情,他们反应很快,而厨师可以专注于他们的工作。

Node.js here would be a single but very talented waiter that can process many requests at a time, and Apache would be a gang of dumb waiters that just process one request each. If this one Node.js waiter would begin to cook, it would be an immediate catastrophe. Still, cooking could also exhaust even a large supply of Apache waiters, not mentioning the chaos in the kitchen and the progressive decrease of responsitivity.

节点。这里的js将是一个能够同时处理多个请求的单独但非常有才华的服务员,而Apache将是一群只处理一个请求的愚蠢服务员。如果这一个节点。侍应生开始做饭,这将是一个即时的灾难。尽管如此,烹饪也会耗尽大量的阿帕奇服务员,并没有提及厨房的混乱和反应能力的逐步下降。

#3


9  

You don't want your CPU intensive code to execute async, you want it to execute in parallel. You need to get the processing work out of the thread that's serving HTTP requests. It's the only way to solve this problem. With NodeJS the answer is the cluster module, for spawning child processes to do the heavy lifting. (AFAIK Node doesn't have any concept of threads/shared memory; it's processes or nothing). You have two options for how you structure your application. You can get the 80/20 solution by spawning 8 HTTP servers and handling compute-intensive tasks synchronously on the child processes. Doing that is fairly simple. You could take an hour to read about it at that link. In fact, if you just rip off the example code at the top of that link you will get yourself 95% of the way there.

您不希望CPU密集型代码执行异步,而是希望它并行执行。您需要从服务HTTP请求的线程中获得处理工作。这是解决这个问题的唯一办法。对于NodeJS,答案是集群模块,用于生成子进程来执行繁重的任务。(AFAIK节点没有任何线程/共享内存的概念;流程或无)。对于如何构造应用程序,您有两个选项。您可以通过生成8个HTTP服务器来获得80/20的解决方案,并在子进程上同步处理计算密集型任务。做到这一点相当简单。你可以花一个小时在这个链接上读到它。实际上,如果你在链接的顶部撕开示例代码,你会得到95%的路径。

The other way to structure this is to set up a job queue and send big compute tasks over the queue. Note that there is a lot of overhead associated with the IPC for a job queue, so this is only useful when the tasks are appreciably larger than the overhead.

另一种构造方法是设置作业队列,并在队列上发送大型计算任务。请注意,对于作业队列,与IPC相关的开销很大,因此只有当任务明显大于开销时,这才有用。

I'm surprised that none of these other answers even mention cluster.

我很惊讶这些答案中没有一个提到群集。

Background: Asynchronous code is code that suspends until something happens somewhere else, at which point the code wakes up and continues execution. One very common case where something slow must happen somewhere else is I/O.

背景:异步代码是挂起的代码,直到其他地方发生了什么事情,这时代码就会被唤醒并继续执行。一个很常见的情况是,在其他地方一定会发生一些缓慢的事情,那就是I/O。

Asynchronous code isn't useful if it's your processor that is responsible for doing the work. That is precisely the case with "compute intensive" tasks.

如果是您的处理器负责这项工作,那么异步代码就没有用处。这正是“计算密集型”任务的情况。

Now, it might seem that asynchronous code is niche, but in fact it's very common. It just happens not to be useful for compute intensive tasks.

现在,看起来异步代码是小众的,但实际上它是非常常见的。它只是在计算密集型任务时不太有用。

Waiting on I/O is a pattern that always happens in web servers, for example. Every client who connects to your sever gets a socket. Most of the time the sockets are empty. You don't want to do anything until a socket receives some data, at which point you want to handle the request. Under the hood an HTTP server like Node is using an eventing library (libev) to keep track of the thousands of open sockets. The OS notifies libev, and then libev notifies NodeJS when one of the sockets gets data, and then NodeJS puts an event on the event queue, and your http code kicks in at this point and handles the events one after the other. Events don't get put on the queue until the socket has some data, so events are never waiting on data - it's already there for them.

等待I/O是web服务器中经常出现的模式。每个连接到服务器的客户端都有一个套接字。大多数情况下,套接字是空的。在套接字接收到一些数据之前,您不需要做任何事情,此时您需要处理请求。在底层,类似Node的HTTP服务器使用一个事件库(libev)来跟踪数千个打开的套接字。操作系统通知libev,然后当一个套接字获得数据时,libev通知NodeJS,然后NodeJS将一个事件放到事件队列中,此时,您的http代码开始工作,并依次处理事件。事件不会被放到队列中,直到套接字有一些数据,所以事件永远不会等待数据——它已经在那里了。

Single threaded event-based web servers makes sense as a paradigm when the bottleneck is waiting on a bunch of mostly empty socket connections and you don't want a whole thread or process for every idle connection and you don't want to poll your 250k sockets to find the next one that has data on it.

单线程的基于事件的web服务器是有意义的作为一个范例瓶颈时等待一群大多空套接字连接,你不想要一个线程或进程的每一个空闲连接和你不想调查你的250 k套接字找到下一个数据。

#4


7  

Couple of approaches you can use.

您可以使用一些方法。

As @Tim notes, you can create an asynchronous task that sits outside or parallel to your main serving logic. Depends on your exact requirements, but even cron can act as a queueing mechanism.

正如@Tim指出的,您可以创建一个位于主服务逻辑之外或与主服务逻辑并行的异步任务。取决于您的确切需求,但即使cron也可以充当排队机制。

WebWorkers can work for your async processes but they are currently not supported by node.js. There are a couple of extensions that provide support, for example: http://github.com/cramforce/node-worker

webworker可以为您的异步进程工作,但它们目前不受node.js支持。有一些扩展提供了支持,例如:http://github.com/cramforce/nodeworker。

You still get you can still reuse modules and code through the standard "requires" mechanism. You just need to ensure that the initial dispatch to the worker passes all the information needed to process the results.

您仍然可以通过标准的“需求”机制重用模块和代码。您只需要确保初始调度传递处理结果所需的所有信息。

#5


0  

Use child_process is one solution. But each child process spawned may consume a lot of memory compared to Go goroutines

使用child_process是一个解决方案。但是,与Go goroutines相比,每个子进程的产生可能会消耗大量的内存

You can also use queue based solution such as kue

您还可以使用基于队列的解决方案,如kue

#1


45  

What you need is a task queue! Moving your long running tasks out of the web-server is a GOOD thing. Keeping each task in "separate" js file promotes modularity and code reuse. It forces you to think about how to structure your program in a way that will make it easier to debug and maintain in the long run. Another benefit of a task queue is the workers can be written in a different language. Just pop a task, do the work, and write the response back.

您需要的是一个任务队列!将长时间运行的任务移出web服务器是一件好事。将每个任务保存在“单独”的js文件中可以促进模块化和代码重用。它迫使您考虑如何以一种更容易调试和长期维护的方式来构建程序。任务队列的另一个好处是可以用不同的语言编写worker。只需弹出一个任务,完成工作,并将响应写回。

something like this https://github.com/resque/resque

这样的https://github.com/resque/resque

Here is an article from github about why they built it http://github.com/blog/542-introducing-resque

下面是github上的一篇文章,讲述了他们为什么要创建这个网站http://github.com/blog/542- introduction resque

#2


267  

This is misunderstanding of the definition of web server -- it should only be used to "talk" with clients. Heavy load tasks should be delegated to standalone programs (that of course can be also written in JS).
You'd probably say that it is dirty, but I assure you that a web server process stuck in resizing images is just worse (even for lets say Apache, when it does not block other queries). Still, you may use a common library to avoid code redundancy.

这是对web服务器的定义的误解——它应该只用于与客户端“交谈”。重载任务应该委托给独立的程序(当然也可以用JS编写)。您可能会说它是脏的,但是我向您保证,在调整图像大小的web服务器进程是更糟糕的(即使是对Apache来说,当它不阻塞其他查询时)。不过,您可以使用公共库来避免代码冗余。

EDIT: I have come up with an analogy; web application should be as a restaurant. You have waiters (web server) and cooks (workers). Waiters are in contact with clients and do simple tasks like providing menu or explaining if some dish is vegetarian. On the other hand they delegate harder tasks to the kitchen. Because waiters are doing only simple things they respond quick, and cooks can concentrate on their job.

编辑:我想到了一个类比;web应用程序应该是餐厅。你有服务员(web服务器)和厨师(员工)。服务员与顾客保持联系,做一些简单的工作,如提供菜单或解释某些菜是否是素食。另一方面,他们把较难的任务分配给厨房。因为服务员只做简单的事情,他们反应很快,而厨师可以专注于他们的工作。

Node.js here would be a single but very talented waiter that can process many requests at a time, and Apache would be a gang of dumb waiters that just process one request each. If this one Node.js waiter would begin to cook, it would be an immediate catastrophe. Still, cooking could also exhaust even a large supply of Apache waiters, not mentioning the chaos in the kitchen and the progressive decrease of responsitivity.

节点。这里的js将是一个能够同时处理多个请求的单独但非常有才华的服务员,而Apache将是一群只处理一个请求的愚蠢服务员。如果这一个节点。侍应生开始做饭,这将是一个即时的灾难。尽管如此,烹饪也会耗尽大量的阿帕奇服务员,并没有提及厨房的混乱和反应能力的逐步下降。

#3


9  

You don't want your CPU intensive code to execute async, you want it to execute in parallel. You need to get the processing work out of the thread that's serving HTTP requests. It's the only way to solve this problem. With NodeJS the answer is the cluster module, for spawning child processes to do the heavy lifting. (AFAIK Node doesn't have any concept of threads/shared memory; it's processes or nothing). You have two options for how you structure your application. You can get the 80/20 solution by spawning 8 HTTP servers and handling compute-intensive tasks synchronously on the child processes. Doing that is fairly simple. You could take an hour to read about it at that link. In fact, if you just rip off the example code at the top of that link you will get yourself 95% of the way there.

您不希望CPU密集型代码执行异步,而是希望它并行执行。您需要从服务HTTP请求的线程中获得处理工作。这是解决这个问题的唯一办法。对于NodeJS,答案是集群模块,用于生成子进程来执行繁重的任务。(AFAIK节点没有任何线程/共享内存的概念;流程或无)。对于如何构造应用程序,您有两个选项。您可以通过生成8个HTTP服务器来获得80/20的解决方案,并在子进程上同步处理计算密集型任务。做到这一点相当简单。你可以花一个小时在这个链接上读到它。实际上,如果你在链接的顶部撕开示例代码,你会得到95%的路径。

The other way to structure this is to set up a job queue and send big compute tasks over the queue. Note that there is a lot of overhead associated with the IPC for a job queue, so this is only useful when the tasks are appreciably larger than the overhead.

另一种构造方法是设置作业队列,并在队列上发送大型计算任务。请注意,对于作业队列,与IPC相关的开销很大,因此只有当任务明显大于开销时,这才有用。

I'm surprised that none of these other answers even mention cluster.

我很惊讶这些答案中没有一个提到群集。

Background: Asynchronous code is code that suspends until something happens somewhere else, at which point the code wakes up and continues execution. One very common case where something slow must happen somewhere else is I/O.

背景:异步代码是挂起的代码,直到其他地方发生了什么事情,这时代码就会被唤醒并继续执行。一个很常见的情况是,在其他地方一定会发生一些缓慢的事情,那就是I/O。

Asynchronous code isn't useful if it's your processor that is responsible for doing the work. That is precisely the case with "compute intensive" tasks.

如果是您的处理器负责这项工作,那么异步代码就没有用处。这正是“计算密集型”任务的情况。

Now, it might seem that asynchronous code is niche, but in fact it's very common. It just happens not to be useful for compute intensive tasks.

现在,看起来异步代码是小众的,但实际上它是非常常见的。它只是在计算密集型任务时不太有用。

Waiting on I/O is a pattern that always happens in web servers, for example. Every client who connects to your sever gets a socket. Most of the time the sockets are empty. You don't want to do anything until a socket receives some data, at which point you want to handle the request. Under the hood an HTTP server like Node is using an eventing library (libev) to keep track of the thousands of open sockets. The OS notifies libev, and then libev notifies NodeJS when one of the sockets gets data, and then NodeJS puts an event on the event queue, and your http code kicks in at this point and handles the events one after the other. Events don't get put on the queue until the socket has some data, so events are never waiting on data - it's already there for them.

等待I/O是web服务器中经常出现的模式。每个连接到服务器的客户端都有一个套接字。大多数情况下,套接字是空的。在套接字接收到一些数据之前,您不需要做任何事情,此时您需要处理请求。在底层,类似Node的HTTP服务器使用一个事件库(libev)来跟踪数千个打开的套接字。操作系统通知libev,然后当一个套接字获得数据时,libev通知NodeJS,然后NodeJS将一个事件放到事件队列中,此时,您的http代码开始工作,并依次处理事件。事件不会被放到队列中,直到套接字有一些数据,所以事件永远不会等待数据——它已经在那里了。

Single threaded event-based web servers makes sense as a paradigm when the bottleneck is waiting on a bunch of mostly empty socket connections and you don't want a whole thread or process for every idle connection and you don't want to poll your 250k sockets to find the next one that has data on it.

单线程的基于事件的web服务器是有意义的作为一个范例瓶颈时等待一群大多空套接字连接,你不想要一个线程或进程的每一个空闲连接和你不想调查你的250 k套接字找到下一个数据。

#4


7  

Couple of approaches you can use.

您可以使用一些方法。

As @Tim notes, you can create an asynchronous task that sits outside or parallel to your main serving logic. Depends on your exact requirements, but even cron can act as a queueing mechanism.

正如@Tim指出的,您可以创建一个位于主服务逻辑之外或与主服务逻辑并行的异步任务。取决于您的确切需求,但即使cron也可以充当排队机制。

WebWorkers can work for your async processes but they are currently not supported by node.js. There are a couple of extensions that provide support, for example: http://github.com/cramforce/node-worker

webworker可以为您的异步进程工作,但它们目前不受node.js支持。有一些扩展提供了支持,例如:http://github.com/cramforce/nodeworker。

You still get you can still reuse modules and code through the standard "requires" mechanism. You just need to ensure that the initial dispatch to the worker passes all the information needed to process the results.

您仍然可以通过标准的“需求”机制重用模块和代码。您只需要确保初始调度传递处理结果所需的所有信息。

#5


0  

Use child_process is one solution. But each child process spawned may consume a lot of memory compared to Go goroutines

使用child_process是一个解决方案。但是,与Go goroutines相比,每个子进程的产生可能会消耗大量的内存

You can also use queue based solution such as kue

您还可以使用基于队列的解决方案,如kue