使用多处理从前导和尾随遍历列表

时间:2021-09-08 21:14:43

I want to iterate over a list with 2 function using multiprocessing one function iterate over the main_list from leading and other from trailing, I want this function each time that iterates over the sample list (g) put the element in main list till one of them find a duplicate in list then I want the terminate both processes and return the seen elements.

我想迭代一个带有2个函数的列表,使用多处理,一个函数遍历main_list,从前导和其他尾随,我希望每次迭代样本列表时都有这个函数(g)将元素放在主列表中直到其中一个在列表中找到重复然后我想终止两个进程并返回看到的元素。

I expect that the first process return :

我希望第一个进程返回:

['a', 'b', 'c', 'd', 'e', 'f']

And the second return :

第二次回归:

['l', 'k', 'j', 'i', 'h', 'g']

this is my code that returns an Error:

这是我的代码返回错误:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.list()

# Fn definitions and such
def a(main_path,g,l=[]):
  for i in g:
    l.append(i)
    print 'a'
    if i in main_path:
      return l
    main_path.append(i)

def b(main_path,g,l=[]):
  for i in g:
    l.append(i)
    print 'b'
    if i in main_path:
      return l
    main_path.append(i)

g=['a','b','c','d','e','f','g','h','i','j','k','l']
g2=g[::-1]

p1 = Process(target=a, args=(d,g))
p2 = Process(target=b, args=(d,g2))
p1.start()
p2.start()

And this is the Traceback:

这是追溯:

a
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bluebird/Desktop/persiantext.py", line 17, in a
    if i in main_path:
  File "<string>", line 2, in __contains__
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
    self._connect()
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
b
    c = SocketClient(address)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
    s.connect(address)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bluebird/Desktop/persiantext.py", line 27, in b
    if i in main_path:
  File "<string>", line 2, in __contains__
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
    self._connect()
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
    c = SocketClient(address)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
    s.connect(address)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory

Note that i have not any idea that how terminate both processes after that one of them find a duplicated element!!

请注意,我不知道如何终止这两个进程后,其中一个进程找到一个重复的元素!

1 个解决方案

#1


There are all kinds of other problems in your code, but since I already explained them on your other question, I won't get into them here.

您的代码中存在各种其他问题,但由于我已经在您的其他问题上对其进行了解释,因此我不会在此处介绍它们。

The new problem is that you're not joining your child processes. In your threaded version, this wasn't an issue just because your main thread accidentally had a "block forever" before the end. But here, you don't have that, so the main process reaches the end of the script while the background processes are still running.

新问题是您没有加入您的子流程。在您的线程版本中,这不是一个问题,因为您的主线程在结束之前意外地“永远阻塞”。但是在这里,你没有那个,所以当后台进程仍在运行时,主进程到达脚本的末尾。

When this happens, it's not entirely defined what your code will do.* But basically, you're destroying the manager object, which shuts down the manager server while the background processes are still using it, so they're going to raise exceptions the next time they try to access a managed object.

当发生这种情况时,它并没有完全定义你的代码会做什么。*但基本上,你正在破坏管理器对象,当后台进程仍在使用它时关闭管理器服务器,所以他们将引发异常下次他们尝试访问托管对象时。

The solution is to add p1.join() and p2.join() to the end of your script.

解决方案是将p1.join()和p2.join()添加到脚本的末尾。

But that really only gets you back to the same situation as your threaded code (except not blocking forever at the end). You've still got code that's completely serialized, and a big race condition, and so on.

但这实际上只会让你回到与线程代码相同的情况(除了最后不会永远阻塞)。你仍然有完全序列化的代码,以及一个很大的竞争条件,等等。


If you're curious why this happens:

如果你很好奇为什么会这样:

At the end of the script, all of your module's globals go out of scope.** Since those variables are the only reference you have to the manager and process objects, those objects get garbage-collected, and their destructors get called.

在脚本结束时,所有模块的全局变量都超出范围。**由于这些变量是对管理器和过程对象的唯一引用,因此这些对象被垃圾收集,并且它们的析构函数被调用。

For a manager object, the destructor shuts down the server.

对于管理器对象,析构函数会关闭服务器。

For a process object, I'm not entirely sure, but I think the destructor does nothing (rather than join it and/or interrupt it). Instead, there's an atexit function, that runs after all of the destructors, that joins any still-running processes.***

对于一个进程对象,我不完全确定,但我认为析构函数什么都不做(而不是加入它和/或中断它)。相反,有一个atexit函数,它在所有析构函数之后运行,它们连接任何仍在运行的进程。***

So, first the manager goes away, then the main process starts waiting for the children to finish; the next time each one tries to access a managed object, it fails and exits. Once all of them do that, the main process finishes waiting and exits.

所以,首先经理离开,然后主要过程开始等待孩子们完成;下次每次尝试访问托管对象时,它都会失败并退出。一旦所有人都这样做,主要过程就会等待并退出。


* The multiprocessing changes in 3.2 and the shutdown changes in 3.4 make things a lot cleaner, so if we weren't talking about 2.7, there would be less "here's what usually happens but not always" and "here's what happens in one particular implementation on one particular platform".

* 3.2中的多处理更改和3.4中的关闭更改使得事情变得更加清晰,所以如果我们不是在谈论2.7,那么“这里通常会发生但不总是这样”并且“这是在一个特定实现中发生的事情”在一个特定的平台上“。

** This isn't actually guaranteed by 2.7, and garbage-collecting all of the modules' globals doesn't always happen. But in this particular simple case, I'm pretty sure it will always work this way, at least in CPython, although I don't want to try to explain why.

**这实际上并不是由2.7保证,并且垃圾收集所有模块的全局变量并不总是发生。但是在这个特别简单的情况下,我很确定它总能以这种方式工作,至少在CPython中,尽管我不想试图解释原因。

*** That's definitely how it works with threads, at least on CPython 2.7 on Unix… again, this isn't at all documented in 2.x, so you can only tell by reading the source or experimenting on the platforms/implementations/versions that matter to you… And I don't want to track this through the source unless there's likely to be something puzzling or interesting to find.

***这绝对是它如何与线程一起工作,至少在Unix上的CPython 2.7上......再次,这完全没有在2.x中记录,所以你只能通过阅读源代码或在平台/实现/上进行实验来判断对你来说很重要的版本...我不希望通过源跟踪这个版本,除非有可能找到令人费解或有趣的东西。

#1


There are all kinds of other problems in your code, but since I already explained them on your other question, I won't get into them here.

您的代码中存在各种其他问题,但由于我已经在您的其他问题上对其进行了解释,因此我不会在此处介绍它们。

The new problem is that you're not joining your child processes. In your threaded version, this wasn't an issue just because your main thread accidentally had a "block forever" before the end. But here, you don't have that, so the main process reaches the end of the script while the background processes are still running.

新问题是您没有加入您的子流程。在您的线程版本中,这不是一个问题,因为您的主线程在结束之前意外地“永远阻塞”。但是在这里,你没有那个,所以当后台进程仍在运行时,主进程到达脚本的末尾。

When this happens, it's not entirely defined what your code will do.* But basically, you're destroying the manager object, which shuts down the manager server while the background processes are still using it, so they're going to raise exceptions the next time they try to access a managed object.

当发生这种情况时,它并没有完全定义你的代码会做什么。*但基本上,你正在破坏管理器对象,当后台进程仍在使用它时关闭管理器服务器,所以他们将引发异常下次他们尝试访问托管对象时。

The solution is to add p1.join() and p2.join() to the end of your script.

解决方案是将p1.join()和p2.join()添加到脚本的末尾。

But that really only gets you back to the same situation as your threaded code (except not blocking forever at the end). You've still got code that's completely serialized, and a big race condition, and so on.

但这实际上只会让你回到与线程代码相同的情况(除了最后不会永远阻塞)。你仍然有完全序列化的代码,以及一个很大的竞争条件,等等。


If you're curious why this happens:

如果你很好奇为什么会这样:

At the end of the script, all of your module's globals go out of scope.** Since those variables are the only reference you have to the manager and process objects, those objects get garbage-collected, and their destructors get called.

在脚本结束时,所有模块的全局变量都超出范围。**由于这些变量是对管理器和过程对象的唯一引用,因此这些对象被垃圾收集,并且它们的析构函数被调用。

For a manager object, the destructor shuts down the server.

对于管理器对象,析构函数会关闭服务器。

For a process object, I'm not entirely sure, but I think the destructor does nothing (rather than join it and/or interrupt it). Instead, there's an atexit function, that runs after all of the destructors, that joins any still-running processes.***

对于一个进程对象,我不完全确定,但我认为析构函数什么都不做(而不是加入它和/或中断它)。相反,有一个atexit函数,它在所有析构函数之后运行,它们连接任何仍在运行的进程。***

So, first the manager goes away, then the main process starts waiting for the children to finish; the next time each one tries to access a managed object, it fails and exits. Once all of them do that, the main process finishes waiting and exits.

所以,首先经理离开,然后主要过程开始等待孩子们完成;下次每次尝试访问托管对象时,它都会失败并退出。一旦所有人都这样做,主要过程就会等待并退出。


* The multiprocessing changes in 3.2 and the shutdown changes in 3.4 make things a lot cleaner, so if we weren't talking about 2.7, there would be less "here's what usually happens but not always" and "here's what happens in one particular implementation on one particular platform".

* 3.2中的多处理更改和3.4中的关闭更改使得事情变得更加清晰,所以如果我们不是在谈论2.7,那么“这里通常会发生但不总是这样”并且“这是在一个特定实现中发生的事情”在一个特定的平台上“。

** This isn't actually guaranteed by 2.7, and garbage-collecting all of the modules' globals doesn't always happen. But in this particular simple case, I'm pretty sure it will always work this way, at least in CPython, although I don't want to try to explain why.

**这实际上并不是由2.7保证,并且垃圾收集所有模块的全局变量并不总是发生。但是在这个特别简单的情况下,我很确定它总能以这种方式工作,至少在CPython中,尽管我不想试图解释原因。

*** That's definitely how it works with threads, at least on CPython 2.7 on Unix… again, this isn't at all documented in 2.x, so you can only tell by reading the source or experimenting on the platforms/implementations/versions that matter to you… And I don't want to track this through the source unless there's likely to be something puzzling or interesting to find.

***这绝对是它如何与线程一起工作,至少在Unix上的CPython 2.7上......再次,这完全没有在2.x中记录,所以你只能通过阅读源代码或在平台/实现/上进行实验来判断对你来说很重要的版本...我不希望通过源跟踪这个版本,除非有可能找到令人费解或有趣的东西。