Analyze a docker instance start failure

时间:2023-03-09 06:01:01
Analyze a docker instance start failure
 

错误信息:Cannot start container xxxxxxxxxxx | Error getting container xxxxxxxxxxxxxxx  from driver devicemapper: Error mounting | invalid argument Error | failed to start containers

现象:4个Docker实例中,三个(基本没在使用)能正常启动,一个(内容最多的那个)不能正常启动。

触发诱因:服务器(Docker宿主机)意外断电。

[root@bogon ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[root@bogon ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a91dadc56996 b1c89dd2c773 "/bin/auto_service.s 7 weeks ago Exited (137) 23 hours ago mawen
91a542541bb1 b1c89dd2c773 "/bin/auto_service.s 8 weeks ago Exited (128) 28 hours ago rgq
fc0a891e1861 68a34cb5482c "/bin/auto_service.s 3 months ago Exited (0) 28 hours ago songheng
79177df3ddc2 b1c89dd2c773 "/bin/auto_service.s 5 months ago Exited (137) 23 hours ago guozhenya
[root@bogon ~]# docker start 91a542541bb1
Error response from daemon: Cannot start container 91a542541bb1: Error getting container 91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1 from driver devicemapper: Error mounting '/dev/mapper/docker-253:2-13369361-91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1' on '/home/docker/images/devicemapper/mnt/91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1': invalid argument
Error: failed to start containers: [91a542541bb1]
Error response from daemon: Cannot start container 91a542541bb1:
Error getting container 91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1 from driver devicemapper:
Error mounting '/dev/mapper/docker-253:2-13369361-91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1'
on '/home/docker/images/devicemapper/mnt/91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1':
invalid argument Error: failed to start containers: [91a542541bb1]

早期查到的原因及方案,并未解决此问题。

https://access.redhat.com/solutions/1565673

https://segmentfault.com/q/1010000003003635

https://www.lsproc.com/post/docker-faq/

https://blog.****.net/wangjia184/article/details/43151041

报错mount错误,无论是用GUI的磁盘管理工具,还是用如下命令行,都会报错。

[root@bogon mapper]# cd /dev/mapper/
[root@bogon mapper]# ll
total
crw-rw----. root root , Oct : control
lrwxrwxrwx. root root Oct : docker-:--91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1 -> ../dm-
lrwxrwxrwx. root root Oct : docker-:--pool -> ../dm-
lrwxrwxrwx. root root Oct : VolGroup-lv_home -> ../dm-
lrwxrwxrwx. root root Oct : VolGroup-lv_root -> ../dm-
lrwxrwxrwx. root root Oct : VolGroup-lv_swap -> ../dm-
[root@bogon mapper]# sudo mkdir -p /mnt/base
[root@bogon mapper]# ll
total
crw-rw----. root root , Oct : control
lrwxrwxrwx. root root Oct : docker-:--91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1 -> ../dm-
lrwxrwxrwx. root root Oct : docker-:--pool -> ../dm-
lrwxrwxrwx. root root Oct : VolGroup-lv_home -> ../dm-
lrwxrwxrwx. root root Oct : VolGroup-lv_root -> ../dm-
lrwxrwxrwx. root root Oct : VolGroup-lv_swap -> ../dm-
[root@bogon mapper]# mount docker-253:2-13369361-91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1 /mnt/base
mount: wrong fs type, bad option, bad superblock on /dev/mapper/docker-253:2-13369361-91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so [root@bogon mapper]# dmesg | tail
EXT4-fs (dm-): bad geometry: block count exceeds size of device ( blocks)
EXT4-fs (dm-): bad geometry: block count exceeds size of device ( blocks)
EXT4-fs (dm-): bad geometry: block count exceeds size of device ( blocks)

device mapper这个驱动的详细解释,很科普的一篇文章:

https://coolshell.cn/articles/17200.html

帖子留言更精彩。

http://www.cnblogs.com/GarfieldEr007/p/5424629.html

结论:DeviceMapper这种东西问题太多了,我们应该把其加入黑名单。

尝试各种方式修复Docker的硬盘文件,结果还是失败了。

[root@bogon mapper]# fsck.ext4 docker-\:--91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1
e2fsck 1.41. (-May-)
The filesystem size (according to the superblock) is blocks
The physical size of the device is blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>? yes [root@bogon mapper]# e2fsck docker-\:--91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1
e2fsck 1.41. (-May-)
The filesystem size (according to the superblock) is blocks
The physical size of the device is blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>? yes [root@bogon mapper]# resize2fs docker-\:--91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1
resize2fs 1.41. (-May-)
resize2fs: New size smaller than minimum () [root@bogon mapper]# resize2fs docker-\:--91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1
resize2fs 1.41. (-May-)
The containing partition (or device) is only (4k) blocks.
You requested a new size of blocks.

各种尝试的方案:

https://access.redhat.com/solutions/55010

最后只能格式化了:

https://unix.stackexchange.com/questions/115698/fix-ext4-fs-bad-geometry-block-count-exceeds-size-of-device

https://serverfault.com/questions/548237/cant-mount-home-after-trying-to-resize-bad-geometry-block-count-exceeds-size

https://www.linuxquestions.org/questions/linux-hardware-18/size-in-superblock-is-different-from-the-physical-size-of-the-partition-298175/

mke2fs -t ext4 docker-\:--91a542541bb1478834df2c40796fbbbba4a0448063d4401871c7f2b63e5246f1

另外的一些收获:

https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html

https://bugzilla.redhat.com/show_bug.cgi?id=1121736

https://docs.docker.com/install/linux/docker-ee/rhel/#prerequisites

On Red Hat Enterprise Linux, Docker EE supports storage drivers, overlay2 and devicemapper. In Docker EE 17.06.2-ee-5 and higher, overlay2 is the recommended storage driver. The following limitations apply:

  • OverlayFS: If selinux is enabled, the overlay2 storage driver is supported on RHEL 7.4 or higher. If selinux is disabled, overlay2 is supported on RHEL 7.2 or higher with kernel version 3.10.0-693 and higher.

  • Device Mapper: On production systems using devicemapper, you must use direct-lvm mode, which requires one or more dedicated block devices. Fast storage such as solid-state media (SSD) is recommended. Do not start Docker until properly configured per the storage guide.

再分别聊聊Docker storage drivers

执行docker info:

  Win10+Hyper-V

C:\Users\RenGuoQiang>docker info
Containers:
Running:
Paused:
Stopped:
Images:
Server Version: 18.06.-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.-linuxkit-aufs
Operating System: Docker for Windows
OSType: linux
Architecture: x86_64
CPUs:
Total Memory: .934GiB
Name: linuxkit-00155df70119
ID: TPBZ:PK4T:IR52:NNN6:X4BI:2P4W:QBXD:T5ZH:4UAZ:HCPC:5QZY:HJ23
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors:
Goroutines:
System Time: --17T05::.8681549Z
EventsListeners:
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/
Live Restore Enabled: false

Ubuntu xx:

Storage Driver: aufs

centos6/7

[root@bogon mapper]# docker info
Containers:
Images:
Storage Driver: devicemapper
Pool Name: docker-:--pool
Pool Blocksize: 65.54 kB
Backing Filesystem: extfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 14.24 GB
Data Space Total: 107.4 GB
Data Space Available: 93.13 GB
Metadata Space Used: 26.58 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.121 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Data loop file: /home/docker/images/devicemapper/devicemapper/data
Metadata loop file: /home/docker/images/devicemapper/devicemapper/metadata
Library Version: 1.02.-RHEL6 (--)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.4.-.el6.elrepo.x86_64
Operating System: <unknown>
CPUs:
Total Memory: 31.43 GiB
Name: bogon
ID: KU2E:PTFN:25CJ:234F:LTHQ:7IEB:JMT6:T4NQ:UPB7:BOCV:LKQF:6QKX

https://*.com/questions/27800340/error-starting-docker-containers

This is known bug occuring with devicemapper driver only.

Here is the reference of the bug: https://github.com/docker/docker/issues/4036

Best solution is to switch either to aufs or overlayfs drivers.

Note that this question seems to be a duplicate from this one: Docker building fails randomly with Error mounting

用centos就容易被坑。