【Centos】Docker停止容器服务后,磁盘I/O仍然迟迟不下降。(未解决)

时间:2022-12-09 19:57:42

前言

安装了docker的服务器只运行了一个容器服务,停止容器服务后,磁盘I/O仍然迟迟不下降。对该现象进行问题复现,故障演练

问题复现,故障演练

通过压测环境kafka造数据,在manyun-ops-14服务器上单独部署dcim-local进行问题复现
PS:虽然该现象由程序缺少依赖jar包引起,希望通过运维角度解决后续引发的问题。

准备工作

docker开启debug日志

主机 ip docker版本(至2022年12月07日位置20.10.21为最新版本)
manyun-ops-14 172.16.0.14 20.10.21

dcim-local演练过程

15:56 启动dcim-local,日志文件不断进行写入,Total DISK WRITE 20~45M/s,Actual DISK WRITE: 约3~4M/s
15:02 执行docker stop manyun.dcim-local主动停止容器运行,Total DISK WRITE 几K/s Actual DISK WRITE: 约3~4M/s
【Centos】Docker停止容器服务后,磁盘I/O仍然迟迟不下降。(未解决)

15:03 dcim-local日志停止滚动,Total DISK WRITE 几K/s Actual DISK WRITE: 约3~4 M/s
【Centos】Docker停止容器服务后,磁盘I/O仍然迟迟不下降。(未解决)
15:34 此时容器还存活,系统日志中已经KILL,iotop -oP,ps -ef|grep java ,ps -alf查看已经无相关父子java进程,但仍然有IO
【Centos】Docker停止容器服务后,磁盘I/O仍然迟迟不下降。(未解决)
【Centos】Docker停止容器服务后,磁盘I/O仍然迟迟不下降。(未解决)
【Centos】Docker停止容器服务后,磁盘I/O仍然迟迟不下降。(未解决)
【Centos】Docker停止容器服务后,磁盘I/O仍然迟迟不下降。(未解决)
15:46 此时docker stop manyun.dcim-local仍然hold on
16:04 此时docker stop manyun.dcim-local仍然hold on
【Centos】Docker停止容器服务后,磁盘I/O仍然迟迟不下降。(未解决)
17:36 至17:35时,Total DISK WRITE 几0K/s Actual DISK WRITE:恢复至几K/s,无系统异常日志

系统日志message

更新docker版本后 忘记开debug了,情况就是和之前的日志一样。请谅解

Dec  2 15:57:05 manyun14 dockerd[492597]: time="2022-12-02T15:57:05.037823027+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=218ea1e6e721a177c062d1116763609b6a706db051c9939aa5a50ea2aba06ba7
Dec  2 15:58:01 manyun14 dockerd[492597]: time="2022-12-02T15:58:01.411023387+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=6009b3103c88576ccf9bf77f2bb6291fd5ce281b84b3b368318007ec2660cdf5
Dec  2 16:00:01 manyun14 dockerd[492597]: time="2022-12-02T16:00:01.685971090+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=de74e567cd85a6994db297c725931459bcb73b53513a24ec9f72ba1bc747011b
Dec  2 16:05:07 manyun14 dockerd[492597]: time="2022-12-02T16:05:07.962224598+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=88e19de7ab7f2a64e28b5b0dd8fad480ca0821506aa7e55a83da0bb6c7a1b51d
Dec  2 16:08:37 manyun14 dockerd[492597]: time="2022-12-02T16:08:37.956801923+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=a521b74e19557c45f44b39153c2a783af26a5d845922b466fd8efb9e28a78f59
Dec  2 16:11:28 manyun14 dockerd[492597]: time="2022-12-02T16:11:28.825107143+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=8445d94623b80d540eb902d97f9d8f08bc621238fd807c2ab634b99fe6a8332c
Dec  2 16:13:08 manyun14 dockerd[492597]: time="2022-12-02T16:13:08.604719208+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=6269d994c5e39b1251787b6685c382455495705417dccf586b6742fb34b0d51c
Dec  2 16:14:03 manyun14 dockerd[492597]: time="2022-12-02T16:14:03.372082113+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=1df09dc6b5c215953e3079c483faeb0ab86850d2f7f7de06010259a5da7ab995
Dec  2 16:18:00 manyun14 dockerd[492597]: time="2022-12-02T16:18:00.068046693+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=b65eae0202dcdc56dd2cf1abd8a81cb3c983e8d34e148875be75e9f4a4b127f3
Dec  2 16:22:37 manyun14 dockerd[492597]: time="2022-12-02T16:22:37.282649671+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=ceb8ee90813ef707e97d65477b73a22f0390e0cc2fc2f564c1ec2732cad1d13c
Dec  2 16:24:03 manyun14 dockerd[492597]: time="2022-12-02T16:24:03.445710365+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=12f87974f6fecd964764b4c35766689fbcbc619af83adba409ed36393c8e8093
Dec  2 16:24:27 manyun14 dockerd[492597]: time="2022-12-02T16:24:27.702268379+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=f60b0ea7034163baea942ece5c6899dd416806eb58653743d1d26a7edfb14fc0
Dec  2 16:27:09 manyun14 dockerd[492597]: time="2022-12-02T16:27:09.001485343+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=d73d12d0fea073520d49654e9c5827f2dfd2b039df218fb4e0a5262ffad89bc5
Dec  2 16:30:05 manyun14 dockerd[492597]: time="2022-12-02T16:30:05.762413955+08:00" level=debug msg="Sending KILL signal to container process" container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe exec=b4acad3f29cc9ad48ee7ccd653feef91bebc2d04f969d3298379b981b31dde12
Dec  2 16:35:07 manyun14 dockerd[492597]: time="2022-12-02T16:35:07.239211818+08:00" level=debug msg="Sending kill signal 15 to container 8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe"
Dec  2 16:36:22 manyun14 dockerd[492597]: time="2022-12-02T16:36:22.518400483+08:00" level=debug msg="Sending kill signal 9 to container 8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe"
Dec  2 16:36:47 manyun14 dockerd[492597]: time="2022-12-02T16:36:47.459962841+08:00" level=debug msg="container kill failed because of 'container not found' or 'no such process'" action=kill container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe error="process already finished: not found"
Dec  2 16:36:57 manyun14 dockerd[492597]: time="2022-12-02T16:36:57.465102093+08:00" level=error msg="Container failed to exit within 10 seconds of kill - trying direct SIGKILL" 

#PS:之后无KILL相关日志,截取至17:36
container=8318526b545cf13a17b73dc55e38ce0b0d8a4c2ca58cc313c46474bfd05e0afe error="context deadline exceeded"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.824866281+08:00" level=debug msg="Registering GET, /pprof/cmdline"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.824968478+08:00" level=debug msg="Registering GET, /pprof/profile"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.825066508+08:00" level=debug msg="Registering GET, /pprof/symbol"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.825183573+08:00" level=debug msg="Registering GET, /pprof/trace"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.825289703+08:00" level=debug msg="Registering GET, /pprof/{name}"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.825755748+08:00" level=info msg="API listen on /var/run/docker.sock"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.826907695+08:00" level=debug msg="Calling HEAD /_ping"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.827032116+08:00" level=debug msg="Calling HEAD /_ping"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.829081410+08:00" level=debug msg="Calling GET /v1.41/containers/json?all=1"
Dec  2 17:33:57 manyun14 dockerd[860413]: time="2022-12-02T17:33:57.829198539+08:00" level=debug msg="Calling GET /v1.41/containers/json?all=1"
Dec  2 17:36:12 manyun14 dockerd[860413]: time="2022-12-02T17:36:12.547148802+08:00" level=debug msg="Calling HEAD /_ping"
Dec  2 17:36:12 manyun14 dockerd[860413]: time="2022-12-02T17:36:12.548326094+08:00" level=debug msg="Calling GET /v1.41/containers/json?all=1"

结论

暂无结论。目前还没搞清楚为什么容器停止后,仍然有 Actual DISK WRITE: 约3~4 M/s,希望有大佬看到之后,评论区给小弟解个惑。万分感谢。