
时间:2023-02-13 19:28:16

I want to iterate over a list with 2 function using multiprocessing one function iterate over the main_list from leading and other from trailing, I want this function each time that iterates over the sample list (g) put the element in main list till one of them find a duplicate in list then I want the terminate both processes and return the seen elements.


I expect that the first process return :


['a', 'b', 'c', 'd', 'e', 'f']

And the second return :


['l', 'k', 'j', 'i', 'h', 'g']

this is my code that returns an Error:


from multiprocessing import Process, Manager

manager = Manager()
d = manager.list()

# Fn definitions and such
def a(main_path,g,l=[]):
  for i in g:
    print 'a'
    if i in main_path:
      return l

def b(main_path,g,l=[]):
  for i in g:
    print 'b'
    if i in main_path:
      return l


p1 = Process(target=a, args=(d,g))
p2 = Process(target=b, args=(d,g2))

And this is the Traceback:


Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bluebird/Desktop/persiantext.py", line 17, in a
    if i in main_path:
  File "<string>", line 2, in __contains__
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
    c = SocketClient(address)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bluebird/Desktop/persiantext.py", line 27, in b
    if i in main_path:
  File "<string>", line 2, in __contains__
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
    c = SocketClient(address)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory

Note that i have not any idea that how terminate both processes after that one of them find a duplicated element!!


1 个解决方案


There are all kinds of other problems in your code, but since I already explained them on your other question, I won't get into them here.


The new problem is that you're not joining your child processes. In your threaded version, this wasn't an issue just because your main thread accidentally had a "block forever" before the end. But here, you don't have that, so the main process reaches the end of the script while the background processes are still running.


When this happens, it's not entirely defined what your code will do.* But basically, you're destroying the manager object, which shuts down the manager server while the background processes are still using it, so they're going to raise exceptions the next time they try to access a managed object.


The solution is to add p1.join() and p2.join() to the end of your script.


But that really only gets you back to the same situation as your threaded code (except not blocking forever at the end). You've still got code that's completely serialized, and a big race condition, and so on.


If you're curious why this happens:


At the end of the script, all of your module's globals go out of scope.** Since those variables are the only reference you have to the manager and process objects, those objects get garbage-collected, and their destructors get called.


For a manager object, the destructor shuts down the server.


For a process object, I'm not entirely sure, but I think the destructor does nothing (rather than join it and/or interrupt it). Instead, there's an atexit function, that runs after all of the destructors, that joins any still-running processes.***


So, first the manager goes away, then the main process starts waiting for the children to finish; the next time each one tries to access a managed object, it fails and exits. Once all of them do that, the main process finishes waiting and exits.


* The multiprocessing changes in 3.2 and the shutdown changes in 3.4 make things a lot cleaner, so if we weren't talking about 2.7, there would be less "here's what usually happens but not always" and "here's what happens in one particular implementation on one particular platform".

* 3.2中的多处理更改和3.4中的关闭更改使得事情变得更加清晰,所以如果我们不是在谈论2.7,那么“这里通常会发生但不总是这样”并且“这是在一个特定实现中发生的事情”在一个特定的平台上“。

** This isn't actually guaranteed by 2.7, and garbage-collecting all of the modules' globals doesn't always happen. But in this particular simple case, I'm pretty sure it will always work this way, at least in CPython, although I don't want to try to explain why.


*** That's definitely how it works with threads, at least on CPython 2.7 on Unix… again, this isn't at all documented in 2.x, so you can only tell by reading the source or experimenting on the platforms/implementations/versions that matter to you… And I don't want to track this through the source unless there's likely to be something puzzling or interesting to find.

***这绝对是它如何与线程一起工作,至少在Unix上的CPython 2.7上......再次,这完全没有在2.x中记录,所以你只能通过阅读源代码或在平台/实现/上进行实验来判断对你来说很重要的版本...我不希望通过源跟踪这个版本,除非有可能找到令人费解或有趣的东西。


There are all kinds of other problems in your code, but since I already explained them on your other question, I won't get into them here.


The new problem is that you're not joining your child processes. In your threaded version, this wasn't an issue just because your main thread accidentally had a "block forever" before the end. But here, you don't have that, so the main process reaches the end of the script while the background processes are still running.


When this happens, it's not entirely defined what your code will do.* But basically, you're destroying the manager object, which shuts down the manager server while the background processes are still using it, so they're going to raise exceptions the next time they try to access a managed object.


The solution is to add p1.join() and p2.join() to the end of your script.


But that really only gets you back to the same situation as your threaded code (except not blocking forever at the end). You've still got code that's completely serialized, and a big race condition, and so on.


If you're curious why this happens:


At the end of the script, all of your module's globals go out of scope.** Since those variables are the only reference you have to the manager and process objects, those objects get garbage-collected, and their destructors get called.


For a manager object, the destructor shuts down the server.


For a process object, I'm not entirely sure, but I think the destructor does nothing (rather than join it and/or interrupt it). Instead, there's an atexit function, that runs after all of the destructors, that joins any still-running processes.***


So, first the manager goes away, then the main process starts waiting for the children to finish; the next time each one tries to access a managed object, it fails and exits. Once all of them do that, the main process finishes waiting and exits.


* The multiprocessing changes in 3.2 and the shutdown changes in 3.4 make things a lot cleaner, so if we weren't talking about 2.7, there would be less "here's what usually happens but not always" and "here's what happens in one particular implementation on one particular platform".

* 3.2中的多处理更改和3.4中的关闭更改使得事情变得更加清晰,所以如果我们不是在谈论2.7,那么“这里通常会发生但不总是这样”并且“这是在一个特定实现中发生的事情”在一个特定的平台上“。

** This isn't actually guaranteed by 2.7, and garbage-collecting all of the modules' globals doesn't always happen. But in this particular simple case, I'm pretty sure it will always work this way, at least in CPython, although I don't want to try to explain why.


*** That's definitely how it works with threads, at least on CPython 2.7 on Unix… again, this isn't at all documented in 2.x, so you can only tell by reading the source or experimenting on the platforms/implementations/versions that matter to you… And I don't want to track this through the source unless there's likely to be something puzzling or interesting to find.

***这绝对是它如何与线程一起工作,至少在Unix上的CPython 2.7上......再次,这完全没有在2.x中记录,所以你只能通过阅读源代码或在平台/实现/上进行实验来判断对你来说很重要的版本...我不希望通过源跟踪这个版本,除非有可能找到令人费解或有趣的东西。