Kolla 4.0.0环境下VIP无法迁移问题排查

时间:2022-06-01 13:02:00

VRRP无法切换VIP的问题分析:

Keepalived的配置文件:
/etc/kolla/keepalived/keepalived.conf
当中,nopreempt选项是影响切换的因素之一,另一个因素则是:
vrrp_instance kolla_internal_vip_51 {
    ...
    track_script {
        check_alive
    }
}
而check_alive的定义是:
vrrp_script check_alive {
    script "/check_alive.sh"
    interval 2
    fall 2
    rise 10
}
即Keepalived根据健康检查脚本/check_alive.sh的返回值决定是否成为主路由器。
/check_alive.sh脚本检查本地HAProxy是否在运行,而本地HAProxy没有在运行,脚本的健康检查结果是失败。
所以VIP无法切换。

本地HAProxy没有在运行,本地haproxy.cfg配置要求监听VIP,而此时本地网卡未配置VIP,导致HAProxy无法监听。
因此只能手动先把VIP转移:
docker stop keepalived
docker stop haproxy
ip addr del 172.24.9.198/32 dev enp2s0f0
ip addr add 172.24.9.198/32 dev enp2s0f0
docker start haproxy
去掉nopreempt选项:
docker start keepalived


此处备忘如何清楚网卡的所有IP地址:
ip addr flush dev eth0



***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** ***** *****
查看HAProxy的启动日志:
[root@node2 ~]# docker logs haproxy
...
[ALERT] 123/132247 (11) : Starting proxy rabbitmq_management: cannot bind socket [172.24.9.198:15672]
[ALERT] 123/132247 (11) : Starting proxy keystone_internal: cannot bind socket [172.24.9.198:5000]
[ALERT] 123/132247 (11) : Starting proxy keystone_admin: cannot bind socket [172.24.9.198:35357]
[ALERT] 123/132247 (11) : Starting proxy glance_registry: cannot bind socket [172.24.9.198:9191]
[ALERT] 123/132247 (11) : Starting proxy glance_api: cannot bind socket [172.24.9.198:9292]
[ALERT] 123/132247 (11) : Starting proxy nova_api: cannot bind socket [172.24.9.198:8774]
[ALERT] 123/132247 (11) : Starting proxy nova_metadata: cannot bind socket [172.24.9.198:8775]
[ALERT] 123/132247 (11) : Starting proxy placement_api: cannot bind socket [172.24.9.198:8780]
[ALERT] 123/132247 (11) : Starting proxy nova_novncproxy: cannot bind socket [172.24.9.198:6080]
[ALERT] 123/132247 (11) : Starting proxy neutron_server: cannot bind socket [172.24.9.198:9696]
[ALERT] 123/132247 (11) : Starting proxy horizon: cannot bind socket [172.24.9.198:80]
[ALERT] 123/132247 (11) : Starting proxy cinder_api: cannot bind socket [172.24.9.198:8776]
[ALERT] 123/132247 (11) : Starting proxy heat_api: cannot bind socket [172.24.9.198:8004]
[ALERT] 123/132247 (11) : Starting proxy heat_api_cfn: cannot bind socket [172.24.9.198:8000]
[ALERT] 123/132247 (11) : Starting proxy mariadb: cannot bind socket [172.24.9.198:3306]
[ALERT] 123/132247 (11) : Starting proxy rabbitmq: cannot bind socket [172.24.9.198:5672]
从日志看出,HAProxy无法在VIP进行监听。

经确认,内核参数“net.ipv4.ip_nonlocal_bind=1”使得HAProxy能够在非本地的IP地址进行监听。

检查发现此内核参数已经设置开机启动:
[root@node2 ~]# grep ip_nonlocal_bind /etc/sysctl.conf
net.ipv4.ip_nonlocal_bind=1

检查内核运行过程中此参数的状态:
[root@node2 ~]# cat /proc/sys/net/ipv4/ip_nonlocal_bind
0
发现此参数为0,不知被谁篡改。

设置为1:
[root@node2 ~]# echo 1 >/proc/sys/net/ipv4/ip_nonlocal_bind

重启haproxy容器:
[root@node2 ~]# docker stop haproxy;docker start haproxy

验证HAProxy已经能够在VIP进行监听:
[root@node2 ~]# ss -lnp|grep '\.198:'
tcp    LISTEN     0      128    172.24.9.198:3306                  *:*                   users:(("haproxy",pid=10784,fd=21))
tcp    LISTEN     0      128    172.24.9.198:8780                  *:*                   users:(("haproxy",pid=10784,fd=14))
tcp    LISTEN     0      128    172.24.9.198:9292                  *:*                   users:(("haproxy",pid=10784,fd=11))
tcp    LISTEN     0      128    172.24.9.198:80                    *:*                   users:(("haproxy",pid=10784,fd=17))