
时间:2022-12-12 22:00:26

I am running a flask app with gunicorn on an EC2 server. I use supervisord to monitor and restart the app server. Yesterday, the server was not responding to http requests. We looked at the status using supervisorctl, and it showed up as running. We looked at the supervisor logs and saw the following error:


CRIT uncaptured python exception, closing channel <POutputDispatcher at 34738328
for <Subprocess at 34314576 with name flask in state RUNNING> (stdout)>
(<type 'exceptions.OSError'>:[Errno 2] No such file or directory


Restarting supervisord fixed the issue for us. Below are the relevant parts of our supervisor config:


childlogdir = /var/log/supervisord/
logfile = /var/log/supervisord/supervisord.log
logfile_maxbytes = 50MB
logfile_backups = 10
loglevel = info
pidfile = /var/log/supervisord/supervisord.pid
umask = 022
nodaemon = false
nocleanup = false

command=newrelic-admin run-program gunicorn app:app -c gunicorn_conf.py

What's strange is that we have 2 servers running behind an ELB and both of them had the same issue 10 mins from each other. I am guessing that the logs for both reached the limit around the same time (which is possible since they both see about the same amount of traffic) and the rollover failed. Any ideas as to why that could've happened?


1 个解决方案



AFAIK supervisor uses its own logging implementation, not the one in the Python stdlib - although the class and method names are pretty similar.

AFAIK管理程序使用自己的日志记录实现,而不是Python stdlib中的日志记录实现 - 尽管类和方法名称非常相似。

There is a potential race condition when deleting files during rollover - you will need to check the source code of your specific supervisor version and compare that with the latest supervisor version, if different. Here is an excerpt from the supervisor code on my system (in the doRollover() method):

在翻转期间删除文件时存在潜在的竞争条件 - 您需要检查特定主管版本的源代码,并将其与最新的主管版本进行比较(如果不同)。以下是我系统上的管理员代码的摘录(在doRollover()方法中):

except OSError, why:
    # catch race condition (already deleted)
    if why[0] != errno.ENOENT:

If your rollover code doesn't do this, you might need to upgrade your supervisor version.


Update: If the error happens on the rename, then it might be a race condition which hasn't yet been caught. Consider asking on the supervisor mailing list.




AFAIK supervisor uses its own logging implementation, not the one in the Python stdlib - although the class and method names are pretty similar.

AFAIK管理程序使用自己的日志记录实现,而不是Python stdlib中的日志记录实现 - 尽管类和方法名称非常相似。

There is a potential race condition when deleting files during rollover - you will need to check the source code of your specific supervisor version and compare that with the latest supervisor version, if different. Here is an excerpt from the supervisor code on my system (in the doRollover() method):

在翻转期间删除文件时存在潜在的竞争条件 - 您需要检查特定主管版本的源代码,并将其与最新的主管版本进行比较(如果不同)。以下是我系统上的管理员代码的摘录(在doRollover()方法中):

except OSError, why:
    # catch race condition (already deleted)
    if why[0] != errno.ENOENT:

If your rollover code doesn't do this, you might need to upgrade your supervisor version.


Update: If the error happens on the rename, then it might be a race condition which hasn't yet been caught. Consider asking on the supervisor mailing list.
