一次nginx文件打开数的问题排查处理

时间:2024-05-18 10:00:48

 

现象:nginx域名配置合并之后,发现consul-template无法完成nginx重载,然后发现需要重启nginx,才能让配置生效。

注意:下次哪个服务有报错,就看重启时所有日志输出,各种情况日志输出。不要忽略细节。很多时候其实已经看到了问题,却没有深入查看问题。

 

 

查看进程最大打开文件个数

#cat /proc/31146/limits|grep "Max open files"

Max open files            1024                 4096                 files  

# cat /usr/lib/systemd/system/openresty.service
[Unit]

[Service]
LimitNOFILE=655350

[Install]

#

 

consul-template无法重载,是因为进程本身无法重载,进程无法打开文件了

[root@vm-nginx003.mm.machangwei.com mcw]# ps -ef|grep nginx
root      3114     1  0 02:19 ?        00:00:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx
nobody    3115  3114  0 02:19 ?        00:00:00 nginx: worker process
nobody    3116  3114  0 02:19 ?        00:00:00 nginx: worker process
root      3132  9036  0 02:19 pts/1    00:00:00 grep --color=auto nginx
[root@vm-nginx003.mm.machangwei.com mcw]# 
[root@vm-nginx003.mm.machangwei.com mcw]# systemctl reload openresty
[root@vm-nginx003.mm.machangwei.com mcw]# ps -ef|grep nginx
root      3114     1  2 02:19 ?        00:00:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx
nobody    3115  3114  0 02:19 ?        00:00:00 nginx: worker process is shutting down
nobody    3116  3114  0 02:19 ?        00:00:00 nginx: worker process is shutting down
nobody    3171  3114  4 02:20 ?        00:00:00 nginx: worker process
nobody    3172  3114  4 02:20 ?        00:00:00 nginx: worker process
root      3175  9036  0 02:20 pts/1    00:00:00 grep --color=auto nginx
[root@vm-nginx003.mm.machangwei.com mcw]# 

 

查看日志报错:有打开太多文件数

[root@vm-nginx003.mm.machangwei.com mcw]# ls /data/logs/nginx/nginx_error.log
/data/logs/nginx/nginx_error.log
[root@vm-nginx003.mm.machangwei.com mcw]# vim /data/logs/nginx/nginx_error.log
2024/05/18 00:59:57 [notice] 15124#15124: exit
2024/05/18 00:59:57 [notice] 1586#1586: signal 17 (SIGCHLD) received from 15123
2024/05/18 00:59:57 [notice] 1586#1586: signal 14 (SIGALRM) received
2024/05/18 00:59:57 [notice] 1586#1586: worker process 15123 exited with code 0
2024/05/18 00:59:57 [notice] 1586#1586: worker process 15124 exited with code 0
2024/05/18 00:59:57 [notice] 1586#1586: exit
2024/05/18 01:00:15 [notice] 15493#15493: using the "epoll" event method
2024/05/18 01:00:15 [notice] 15493#15493: openresty/1.19.3.1
2024/05/18 01:00:15 [notice] 15493#15493: built by gcc 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC) 
2024/05/18 01:00:15 [notice] 15493#15493: OS: Linux 5.4.65-200.el7.x86_64
2024/05/18 01:00:15 [notice] 15493#15493: getrlimit(RLIMIT_NOFILE): 1024:4096
2024/05/18 01:00:15 [notice] 15496#15496: start worker processes
2024/05/18 01:00:15 [notice] 15496#15496: start worker process 15497
2024/05/18 01:00:15 [notice] 15496#15496: start worker process 15498
2024/05/18 01:00:29 [notice] 15496#15496: signal 1 (SIGHUP) received from 15545, reconfiguring
2024/05/18 01:00:29 [notice] 15496#15496: reconfiguring
2024/05/18 01:00:29 [warn] 15496#15496: conflicting server name "test-content-review.mcwcn.com" on 0.0.0.0:80, ignored
2024/05/18 01:00:29 [warn] 15496#15496: could not build optimal server_names_hash, you should increase either server_names_hash_max_size: 51
2 or server_names_hash_bucket_size: 128; ignoring server_names_hash_bucket_size
2024/05/18 01:00:29 [emerg] 15496#15496: open() "/data/logs/nginx/dev-account-mcwapps_error.log" failed (24: Too many open files)
2024/05/18 01:01:03 [notice] 15496#15496: signal 1 (SIGHUP) received from 15723, reconfiguring
2024/05/18 01:01:03 [notice] 15496#15496: reconfiguring
2024/05/18 01:01:03 [warn] 15496#15496: conflicting server name "test-content-review.mcwcn.com" on 0.0.0.0:80, ignored
2024/05/18 01:01:03 [warn] 15496#15496: could not build optimal server_names_hash, you should increase either server_names_hash_max_size: 51
2 or server_names_hash_bucket_size: 128; ignoring server_names_hash_bucket_size
2024/05/18 01:01:03 [emerg] 15496#15496: open() "/data/logs/nginx/dev-account-mcwapps_error.log" failed (24: Too many open files)
2024/05/18 01:01:14 [notice] 15496#15496: signal 1 (SIGHUP) received from 15750, reconfiguring
2024/05/18 01:01:14 [notice] 15496#15496: reconfiguring
2024/05/18 01:00:29 [warn] 15496#15496: conflicting server name "test-content-review.mcwcn.com" on 0.0.0.0:80, ignored
2024/05/18 01:00:29 [warn] 15496#15496: could not build optimal server_names_hash, you should increase either server_names_hash_max_size: 51
2 or server_names_hash_bucket_size: 128; ignoring server_names_hash_bucket_size
2024/05/18 01:00:29 [emerg] 15496#15496: open() "/data/logs/nginx/dev-account-mcwapps_error.log" failed (24: Too many open files)
2024/05/18 01:01:03 [notice] 15496#15496: signal 1 (SIGHUP) received from 15723, reconfiguring
2024/05/18 01:01:03 [notice] 15496#15496: reconfiguring
2024/05/18 01:01:03 [warn] 15496#15496: conflicting server name "test-content-review.mcwcn.com" on 0.0.0.0:80, ignored
2024/05/18 01:01:03 [warn] 15496#15496: could not build optimal server_names_hash, you should increase either server_names_hash_max_size: 51
2 or server_names_hash_bucket_size: 128; ignoring server_names_hash_bucket_size
2024/05/18 01:01:03 [emerg] 15496#15496: open() "/data/logs/nginx/dev-account-mcwapps_error.log" failed (24: Too many open files)
2024/05/18 01:01:14 [notice] 15496#15496: signal 1 (SIGHUP) received from 15750, reconfiguring
2024/05/18 01:01:14 [notice] 15496#15496: reconfiguring
2024/05/18 01:01:14 [warn] 15496#15496: conflicting server name "test-content-review.mcwcn.com" on 0.0.0.0:80, ignored
2024/05/18 01:01:14 [warn] 15496#15496: could not build optimal server_names_hash, you should increase either server_names_hash_max_size: 51
2 or server_names_hash_bucket_size: 128; ignoring server_names_hash_bucket_size
2024/05/18 01:01:14 [emerg] 15496#15496: open() "/data/logs/nginx/dev-account-mcwapps_error.log" failed (24: Too many open files)
2024/05/18 01:02:46 [notice] 15496#15496: signal 1 (SIGHUP) received from 16163, reconfiguring
2024/05/18 01:02:46 [notice] 15496#15496: reconfiguring
2024/05/18 01:02:46 [warn] 15496#15496: conflicting server name "test-content-review.mcwcn.com" on 0.0.0.0:80, ignored
2024/05/18 01:02:46 [warn] 15496#15496: could not build optimal server_names_hash, you should increase either server_names_hash_max_size: 51
2 or server_names_hash_bucket_size: 128; ignoring server_names_hash_bucket_size
2024/05/18 01:02:46 [emerg] 15496#15496: open() "/data/logs/nginx/dev-account-mcwapps_error.log" failed (24: Too many open files)
2024/05/18 01:03:54 [notice] 15496#15496: signal 3 (SIGQUIT) received from 16526, shutting down
2024/05/18 01:03:54 [notice] 15498#15498: gracefully shutting down
2024/05/18 01:03:54 [notice] 15498#15498: signal 15 (SIGTERM) received from 1, exiting
2024/05/18 01:03:54 [notice] 15496#15496: signal 15 (SIGTERM) received from 1, exiting
2024/05/18 01:03:54 [notice] 15497#15497: signal 15 (SIGTERM) received from 1, exiting

 

这里查看的,不是那个进程所能打开的最大个数

[root@vm-nginx003.mm.machangwei.com mcw]# cat /etc/security/limits.conf
# End of file
* soft nproc 655350
* hard nproc 655350
* soft nofile 655350
* hard nofile 655350
[root@vm-nginx003.mm.machangwei.com mcw]#
[root@vm-nginx003.mm.machangwei.com mcw]# sysctl -a|grep fs.file-max
fs.file-max = 7930900
[root@vm-nginx003.mm.machangwei.com mcw]#

找到进程id,查看进程可以打开的最大个数,虽然nginx配置了worker可以打开很多个文件,但是也没有设置master进程打开文件个数

[root@vm-nginx003.mm.machangwei.com mcw]# ps -ef|grep nginx
root     31146     1  0 02:01 ?        00:00:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx
nobody   31147 31146  0 02:01 ?        00:00:00 nginx: worker process
nobody   31148 31146  0 02:01 ?        00:00:00 nginx: worker process
root     32633  9036  0 02:07 pts/1    00:00:00 grep --color=auto nginx
[root@vm-nginx003.mm.machangwei.com mcw]# 

[root@vm-nginx003.mm.machangwei.com mcw]# cat /proc/31146/limits|grep "Max open files"
Max open files            1024                 4096                 files     
[root@vm-nginx003.mm.machangwei.com mcw]# 

master进程是systemd启动,systemd启动的进程需要设置打开文件大小个数,新增配置项LimitNOFILE=655350,把数弄大点

[root@vm-nginx003.mm.machangwei.com mcw]# cat /usr/lib/systemd/system/openresty.service
[Unit]
Description=The OpenResty Application Platform
After=syslog.target network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

[Service]
LimitNOFILE=655350
Type=forking
PIDFile=/usr/local/openresty/nginx/logs/nginx.pid
ExecStartPre=/usr/local/openresty/nginx/sbin/nginx -t
ExecStart=/usr/local/openresty/nginx/sbin/nginx
#ExecStart=/bin/openresty
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target
[root@vm-nginx003.mm.machangwei.com mcw]# 

重启之后,查看进程最大支持打开文件个数,已经被修改了

[root@vm-nginx003.mm.machangwei.com mcw]# ps -ef|grep nginx
root      3114     1  0 02:19 ?        00:00:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx
nobody    3223  3114  0 02:20 ?        00:00:00 nginx: worker process
nobody    3224  3114  0 02:20 ?        00:00:00 nginx: worker process
root      3618  9036  0 02:21 pts/1    00:00:00 grep --color=auto nginx
[root@vm-nginx003.mm.machangwei.com mcw]# 
[root@vm-nginx003.mm.machangwei.com mcw]# cat /proc/3114/limits|grep "Max open files"
Max open files            655350               655350               files     
[root@vm-nginx003.mm.machangwei.com mcw]# 

多次重载nginx,可以看到旧的子进程在关闭,新的子进程在启动代替它

[root@vm-nginx003.mm.machangwei.com mcw]# ps -ef|grep nginx
root      3114     1  0 02:19 ?        00:00:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx
nobody    3115  3114  0 02:19 ?        00:00:00 nginx: worker process
nobody    3116  3114  0 02:19 ?        00:00:00 nginx: worker process
root      3132  9036  0 02:19 pts/1    00:00:00 grep --color=auto nginx
[root@vm-nginx003.mm.machangwei.com mcw]# 
[root@vm-nginx003.mm.machangwei.com mcw]# systemctl reload openresty
[root@vm-nginx003.mm.machangwei.com mcw]# ps -ef|grep nginx
root      3114     1  2 02:19 ?        00:00:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx
nobody    3115  3114  0 02:19 ?        00:00:00 nginx: worker process is shutting down
nobody    3116  3114  0 02:19 ?        00:00:00 nginx: worker process is shutting down
nobody    3171  3114  4 02:20 ?        00:00:00 nginx: worker process
nobody    3172  3114  4 02:20 ?        00:00:00 nginx: worker process
root      3175  9036  0 02:20 pts/1    00:00:00 grep --color=auto nginx
[root@vm-nginx003.mm.machangwei.com mcw]# 

 

systemd 启动配置openresty.service,配置的是master打开的文件个数,nginx.conf里面设置的是worker打开的文件个数

# ps -ef|grep nginx
root      3114     1  0 02:19 ?        00:00:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx
nobody    3223  3114  0 02:20 ?        00:00:01 nginx: worker process
nobody    3224  3114  0 02:20 ?        00:00:01 nginx: worker process
root     14607  9036  0 03:07 pts/1    00:00:00 grep --color=auto nginx
# 
# cat /proc/3114/limits |grep "Max open files"
Max open files            655350               655350               files     
# cat /proc/3223/limits |grep "Max open files"
Max open files            51200                51200                files     
# 

 

 

=====

参考信息如下:

 

nginx重载,/bin/kill -s HUP 不生效

 

在Nginx中,/bin/kill -s HUP 命令通常用于重新加载Nginx的配置文件,而不是挂起(suspend)Nginx进程。如果你发现这个命令没有生效,可能是由以下几个原因造成的:

  1. 错误的进程ID:确保你发送HUP信号的进程ID(PID)是正确的。你可以通过 ps 命令查找Nginx的主进程ID。

    bash复制代码
      ps aux | grep nginx

    通常,你会看到类似 nginx master process 的行,后面跟着的就是主进程的PID。

  2. 权限问题:你可能没有足够的权限来发送信号给Nginx进程。确保你具有发送信号到该进程的权限,或者使用root用户来执行该命令。

  3. Nginx进程未运行:确保Nginx服务正在运行。你可以使用 systemctl status nginx(对于使用systemd的系统)或 service nginx status(对于使用SysVinit的系统)来检查Nginx的状态。

  4. Nginx配置文件有错误:如果Nginx的配置文件存在语法错误,nginx -s reload 或 /bin/kill -s HUP [nginx_pid] 可能不会生效,因为Nginx无法重新加载一个无效的配置。你可以使用 nginx -t 来测试配置文件的语法是否正确。

  5. Nginx不是通过init系统启动的:如果Nginx不是通过init系统(如systemd或SysVinit)启动的,那么它可能不会响应HUP信号。确保Nginx是通过标准的init系统服务来管理的。

  6. SELinux或AppArmor安全策略:如果你的系统启用了SELinux或AppArmor这样的安全模块,它们可能会阻止进程之间的信号传递。检查相关的安全策略或日志文件以确认是否有相关的限制。

  7. 其他系统级限制:有些系统级的限制(如cgroups、资源限制等)可能会阻止进程接收或处理信号。

为了解决这个问题,你可以尝试以下步骤:

  • 使用正确的Nginx进程ID重新发送HUP信号。
  • 确保你具有发送信号所需的权限。
  • 检查Nginx的状态和日志文件,确认它是否正在运行且没有错误。
  • 使用 nginx -t 测试Nginx配置文件的语法是否正确。
  • 如果可能的话,尝试通过init系统(如 systemctl reload nginx)来重新加载Nginx配置。
  • 检查SELinux或AppArmor的安全策略,确保它们没有阻止信号传递。
  • 如果以上步骤都无法解决问题,考虑重启Nginx服务(但请注意,这会导致短暂的服务中断)。

 

nginx报错 open() "_error.log" failed (24: Too many open files)

 

Nginx 报错 "open() "_error.log" failed (24: Too many open files)" 意味着 Nginx 进程尝试打开的文件数量超过了系统或用户级别的限制。在 Linux 系统中,每个进程和每个用户都有打开文件数量的限制。

要解决这个问题,你可以采取以下几个步骤:

  1. 检查当前限制:
    你可以使用 ulimit -n 命令来查看当前 shell 会话中打开文件的数量限制。然而,Nginx 进程的限制可能由系统或用户级别的限制设置,而不是由 shell 会话设置。

    为了查看 Nginx 进程的实际限制,你可以查看 /proc/[nginx_pid]/limits 目录中的文件,其中 [nginx_pid] 是 Nginx 主进程的 PID。或者,你可以使用 cat /proc/[nginx_pid]/limits | grep 'Max open files' 命令。

  2. 增加用户级别的限制:
    如果你确定问题是用户级别的限制,你可以修改 /etc/security/limits.conf 文件,添加或修改类似下面的行来增加打开文件的限制:

    复制代码
      * soft nofile 4096
      * hard nofile 65535

    或者,只为运行 Nginx 的用户(如 www-data)设置限制:

    复制代码
      www-data soft nofile 4096
      www-data hard nofile 65535

    注意:修改这些限制后,需要重启 Nginx 或重新登录用户才能生效。

  3. 增加系统级别的限制:
    如果你发现系统级别的限制也很低,你可能需要修改 /etc/sysctl.conf 文件(或相应的 sysctl 配置位置),增加 fs.file-max 的值。例如:

    复制代码
      fs.file-max = 100000

    修改后,需要运行 sysctl -p 或重启系统来应用新的设置。

  4. 优化 Nginx 配置:
    确保你的 Nginx 配置中没有导致打开过多文件的设置。例如,避免不必要的日志记录、优化缓存设置等。

  5. 检查文件描述符泄漏:
    如果 Nginx 进程持续打开新的文件描述符而没有关闭它们,这可能是资源泄漏的迹象。使用工具如 lsoft 或 strace 可以帮助你诊断这种情况。

  6. 重启 Nginx:
    在修改限制或修复潜在的问题后,重启 Nginx 以确保更改生效。

  7. 监控和日志:
    监控 Nginx 进程的文件描述符使用情况,并定期检查日志文件以获取可能的线索。