[QA] Ceph-deploy 安装 Ceph 环境报错分析

时间:2022-01-01 02:26:21

报错和分析

[ceph_deploy][ERROR ] RuntimeError: NoSectionError: No section: ‘ceph’

Operation:
ceph-deploy install ceph1 ceph2 ceph3
ERROR:
[ceph1][INFO ] Running command: rpm -Uvh --replacepkgs https://download.ceph.com/rpm-jewel/el7/noarch/ceph-release-1-0.el7.noarch.rpm
......
[ceph_deploy][ERROR ] RuntimeError: NoSectionError: No section: 'ceph'
Fix:
安装包冲突,直接移除已经安装的 rpm(把各个节点的这个包都移除)
yum remove ceph-release -y

[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph –version

Operation:
ceph-deploy install ceph1 ceph2 ceph3
ERROR:
[ceph1][INFO ] Running command: yum -y install ceph ceph-radosgw
......
[ceph1][WARNIN] No data was received after 300 seconds, disconnecting...
[ceph1][INFO ] Running command: ceph --version
......
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph --version
Fix:
ceph1 安装速度过慢,已经超时了,直接手动安装
yum -y install ceph ceph-radosgw

[ceph1][ERROR ] “ceph auth get-or-create for keytype admin returned 1

Operation:
ceph-deploy mon create-initial
ERROR:
[ceph1][INFO ] Running command: /usr/bin/ceph --connect-timeout=25 --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.ceph1.asok mon_status
[ceph1][INFO ] Running command
: /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-ceph1/keyring auth get client.admin
[ceph1][ERROR ] "ceph auth get-or-create for keytype admin returned 1
[ceph1][DEBUG ] 2017-05-17 16
:23:57.333371 7fafc0106700 0 -- :/1111132495 >> 192.168.122.18:6789/0 pipe(0x7fafbc061830 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7fafbc05d6e0).fault
......
[ceph1][DEBUG ] Traceback (most recent call last):
[ceph1][DEBUG ] File "/usr/bin/ceph", line 948, in <module>
[ceph1][DEBUG ] retval = main()

[ceph1][DEBUG ] File "/usr/bin/ceph", line 852, in main
[ceph1][DEBUG ] prefix='get_command_descriptions')
[ceph1][DEBUG ] File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 1300, in json_command
[ceph1][DEBUG ] raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
[ceph1][DEBUG ] RuntimeError: "None": exception "['{"prefix": "get_command_descriptions"}']": exception You cannot perform that operation on a Rados object in state configuring.
[ceph1][ERROR ] Failed to return 'admin' key from host ceph1
[ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:ceph1
[ceph_deploy.gatherkeys][INFO ] Destroy temp directory /tmp/tmpWajR2W
[ceph_deploy][ERROR ] RuntimeError
: Failed to connect any mon
Fix:
ceph.conf 中 public network 配置错误,和 mon_host 不在同一个网段。
配置正确的应该是:
mon_host = 192.168.122.18
public network = 192.168.122.18/24
之前的配置是:
mon_host = 192.168.122.18
public network = 172.16.34.253/24

这边是有点意思的地方,ceph-deploy 启动 monitor 的时候会优先使用 public network 配置的网络(ceph1 的 172.16.34.253)启动监听端口(6789),但是在和 monitor 连接的时候使用的却是 mon_host 的 IP(即 192.168.122.18),会一致连接 192.168.122.18
:6789,很显然联不通。这个就是错误的原因。虽然是一个很低级的错误,但是这个却让人感觉挺奇怪的。