配置文件说明

global_defs区域

global_defs {

   notification_email {

     acassen@firewall.loc

     failover@firewall.loc

     sysadmin@firewall.loc

   }

   notification_email_from Alexandre.Cassen@firewall.loc

   smtp_server 192.168.200.1

   smtp_connect_timeout 30

   router_id LVS_DEVEL

}

notification_email 故障发生时给谁发邮件通知。
notification_email_from 通知邮件从哪个地址发出。
smpt_server 通知邮件的smtp地址。
smtp_connect_timeout 连接smtp服务器的超时时间。
enable_traps 开启SNMP陷阱（Simple Network Management Protocol）。
router_id 标识本节点的字条串，通常为hostname，但不一定非得是hostname。故障发生时，邮件通知会用到

vrrp_instance区域

vrrp_instance VI_1 {

    state MASTER

    interface eth0

    virtual_router_id 51

    priority 100

    advert_int 1

    authentication {

        auth_type PASS

        auth_pass 1111

    }

    virtual_ipaddress {

        192.168.200.16

    }

}

state 可以是MASTER或BACKUP，不过当其他节点keepalived启动时会将priority比较大的节点选举为MASTER，因此该项其实没有实质用途。
interface 节点固有IP（非VIP）的网卡，用来发VRRP包。
virtual_router_id 取值在0-255之间，用来区分多个instance的VRRP组播。注意：同一网段中virtual_router_id的值不能重复，否则会出错。
priority 用来选举master的，要成为master，那么这个选项的值最好高于其他机器50个点，该项取值范围是1-255（在此范围之外会被识别成默认值100）。
advert_int 发VRRP包的时间间隔，即多久进行一次master选举（可以认为是健康查检时间间隔）。
authentication 认证区域，认证类型有PASS和HA（IPSEC），推荐使用PASS（密码只识别前8位）。
virtual_ipaddress vip，不解释了。

virtual_server区域

virtual_server 10.10.10.2 1358 {

    delay_loop 6

    lb_algo rr

    lb_kind NAT

    persistence_timeout 50

    protocol TCP

    sorry_server 192.168.200.200 1358

    real_server 192.168.200.2 1358 {

      weight 1

      TCP_CHECK {

            connect_timeout 10

            nb_get_retry 3

            delay_before_retry 3

            connect_port 80

        }

    }

    real_server 192.168.200.3 1358 {

        weight 1

      TCP_CHECK {

            connect_timeout 10

            nb_get_retry 3

            delay_before_retry 3

            connect_port 80

        }

    }

}

delay_loop 延迟轮询时间（单位秒）。
lb_algo 后端调试算法（load balancing algorithm）。
lb_kind LVS调度类型NAT/DR/TUN。
persistence_timeout：会话保持时间，单位是秒。这个选项对动态网站很有用处：当用户从远程用帐号进行登陆网站时，有了这个会话保持功能，就能把用户的请求转发给同一个应用服务器。在这里，我们来做一个假设，假定现在有一个lvs 环境，使用DR转发模式，真实服务器有3个，负载均衡器不启用会话保持功能。当用户第一次访问的时候，他的访问请求被负载均衡器转给某个真实服务器，这样他看到一个登陆页面，第一次访问完毕；接着他在登陆框填写用户名和密码，然后提交；这时候，问题就可能出现了—登陆不能成功。因为没有会话保持，负载均衡器可能会把第2次的请求转发到其他的服务器。
sorry_server 当所有real server宕掉时，sorry server顶替。
connect_port 健康检查，如果端口通则认为服务器正常。
connect_timeout,nb_get_retry,delay_before_retry分别表示超时时长、重试次数，下次重试的时间延迟。

keepalived+lvs环境搭建

环境说明

功能	IP	安装软件	系统
master	192.168.5.200	keepalived、ipvsadm	CentOS 6
slave	192.168.5.228	keepalived、ipvsadm	CentOS 6
node1	192.168.5.229	httpd	CentOS 6
node2	192.168.5.230	httpd	CentOS 6

rs提供测试页

# curl http://192.168.5.229

192.168.5.229

# curl http://192.168.5.230

192.168.5.230

rs节点配置

# cat rs.sh

#!/bin/bash

#

# Script to start LVS DR real server.

# description: LVS DR real server

#

.  /etc/rc.d/init.d/functions

VIP=192.168.5.188 #修改你的VIP

host=`/bin/hostname`

case "$1" in

start)

       # Start LVS-DR real server on this machine.

        /sbin/ifconfig lo down

        /sbin/ifconfig lo up

        echo 1 > /proc/sys/net/ipv4/conf/lo/arp_ignore

        echo 2 > /proc/sys/net/ipv4/conf/lo/arp_announce

        echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore

        echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce

        /sbin/ifconfig lo:0 $VIP broadcast $VIP netmask 255.255.255.255 up

        /sbin/route add -host $VIP dev lo:0

;;

stop)

        # Stop LVS-DR real server loopback device(s).

        /sbin/ifconfig lo:0 down

        echo 0 > /proc/sys/net/ipv4/conf/lo/arp_ignore

        echo 0 > /proc/sys/net/ipv4/conf/lo/arp_announce

        echo 0 > /proc/sys/net/ipv4/conf/all/arp_ignore

        echo 0 > /proc/sys/net/ipv4/conf/all/arp_announce

;;

status)

        # Status of LVS-DR real server.

        islothere=`/sbin/ifconfig lo:0 | grep $VIP`

        isrothere=`netstat -rn | grep "lo:0" | grep $VIP`

        if [ ! "$islothere" -o ! "isrothere" ];then

            # Either the route or the lo:0 device

            # not found.

            echo "LVS-DR real server Stopped."

        else

            echo "LVS-DR real server Running."

        fi

;;

*)

            # Invalid entry.

            echo "$0: Usage: $0 {start|status|stop}"

            exit 1

;;

esac

node1、和node2上按上面的方式进行配置。

vs机器的keepalived配置

master机器配置：

global_defs {

   notification_email {

   xxxxxx@xxx.com

   }

   notification_email_from xxxxxx@xxx.com

   smtp_server 127.0.0.1

   smtp_connect_timeout 30

   router_id LVS_MASTER_5.200

}

vrrp_instance VI_1 {

    state MASTER

    interface eth0

    virtual_router_id 51

    priority 188

    advert_int 1

    authentication {

        auth_type PASS

        auth_pass 1111

    }

    virtual_ipaddress {

        192.168.5.188

    }

}

virtual_server 192.168.5.188 80 {

    delay_loop 6

    lb_algo rr

    lb_kind DR

    nat_mask 255.255.255.0

    #persistence_timeout 50

    protocol TCP

    real_server 192.168.5.229 80 {

        weight 1

        HTTP_GET {

            url {

              path /

             status_code 200

            }

            connect_timeout 3

            nb_get_retry 3

            delay_before_retry 3

        }

    }

    real_server 192.168.5.230 80 {

        weight 1

        HTTP_GET {

            url {

              path /

             status_code 200

            }

            connect_timeout 3

            nb_get_retry 3

            delay_before_retry 3

        }

    }

    sorry_server 127.0.0.1 80

}

backup机器配置：

global_defs {

   notification_email {

  xxxxxx@xxx.com

   }

   notification_email_from xxxxxx@xxx.com

   smtp_server 127.0.0.1

   smtp_connect_timeout 30

   router_id LVS_BACKUP_5.228

}

vrrp_instance VI_1 {

    state BACKUP

    interface eth0

    virtual_router_id 51

    priority 100

    advert_int 1

    authentication {

        auth_type PASS

        auth_pass 1111

    }

    virtual_ipaddress {

        192.168.5.188

    }

}

virtual_server 192.168.5.188 80 {

    delay_loop 6

    lb_algo rr

    lb_kind DR

    nat_mask 255.255.255.0

    #persistence_timeout 50

    protocol TCP

    real_server 192.168.5.229 80 {

        weight 1

        HTTP_GET {

            url {

              path /

              status_code 200

            }

            connect_timeout 3

            nb_get_retry 3

            delay_before_retry 3

        }

    }

    real_server 192.168.5.230 80 {

        weight 1

        HTTP_GET {

            url {

              path /

              status_code 200

            }

            connect_timeout 3

            nb_get_retry 3

            delay_before_retry 3

        }

    }

}

注意：在keepalived的master和backup中的配置文件是有区别的。state和priority选项需要修改的。
除了使用HTTP_GET方式进行检测之外还可以使用TCP_CHECK等方式进行检测rs：

    real_server 192.168.5.229 80 {

    weight 3

        TCP_CHECK {

            connect_timeout 10

            nb_get_retry 3

            delay_before_retry 3

            connect_port 80

        }

    }

扩展学习：
http://cuchadanfan.blog.51cto.com/9940284/1696588
https://github.com/chenzhiwei/linux/tree/master/keepalived

keepalived+nginx

keepalived配置

master

global_defs {

   notification_email {

  xxxxxx@xxx.com

   }

   notification_email_from xxxxxx@xxx.com

   smtp_server 127.0.0.1

   smtp_connect_timeout 30

   router_id LVS_BACKUP_5.228

}

vrrp_script chk_nginx {

    script "/etc/keepalived/nginx_check.sh"

    interval 2

    weight -5

    fall 3

    rise 2

}

vrrp_instance VI_1 {

    state MASTER

    interface eth0

    virtual_router_id 146

    mcast_src_ip 192.168.5.228

    priority 100

    nopreempt

    advert_int 1

    authentication {

        auth_type PASS

        auth_pass 1111

    }

    track_script {

        chk_nginx

    }

    virtual_ipaddress {

        192.168.5.188

    }

}

backup配置

global_defs {

   notification_email {

  xxxxxx@xxx.com

   }

   notification_email_from xxxxxx@xxx.com

   smtp_server 127.0.0.1

   smtp_connect_timeout 30

   router_id LVS_BACKUP_5.200

}

vrrp_script chk_nginx {

    script "/etc/keepalived/nginx_check.sh"

    interval 2

    weight -5

    fall 3

    rise 2

}

vrrp_instance VI_1 {

    state BACKUP

    interface eth0

    virtual_router_id 146

    mcast_src_ip 192.168.5.200

    priority 90

    advert_int 1

    authentication {

        auth_type PASS

        auth_pass 1111

    }

    track_script {

        chk_nginx

    }

    virtual_ipaddress {

        192.168.5.188

    }

}

nginx监控脚本

# cat /etc/keepalived/nginx_check.sh

#!/bin/bash

counter=$(ps -C nginx --no-heading|wc -l)

if [ "${counter}" = "0" ]; then

    /usr/sbin/nginx

    sleep 2

    counter=$(ps -C nginx --no-heading|wc -l)

    if [ "${counter}" = "0" ]; then

        /etc/init.d/keepalived stop

    fi

fi

参数说明：

mcast_src_ip ：发送多播数据包时的源IP地址，这里注意了，这里实际上就是在那个地址上发送VRRP通告，这个非常重要，一定要选择稳定的网卡端口来发送，这里相当于heartbeat的心跳端口，如果没有设置那么就用默认的绑定的网卡的IP，也就是interface指定的IP地址
virtual_ipaddress ：这里设置的就是VIP，也就是虚拟IP地址，他随着state的变化而增加删除，当state为master的时候就添加，当state为backup的时候删除，这里主要是有优先级来决定的，和state设置的值没有多大关系，这里可以设置多个IP地址
track_script ：引用VRRP脚本，即在 vrrp_script 部分指定的名字。定期运行它们来改变优先级，并最终引发主备切换。
script ：自己写的检测脚本。也可以是一行命令如killall -0 nginx
interval 2 ：每2s检测一次
weight -5 ：检测失败（脚本返回非0）则优先级 -5
fall 2 ：检测连续 2 次失败才算确定是真失败。会用weight减少优先级（1-255之间）
rise 1 ：检测 1 次成功就算成功。但不修改优先级

nginx配置

在两台192.168.5.200、192.168.5.228上分别安装nginx，然后在nginx上配置upstream并把前面的请求抛给后端的192.168.5.229、192.168.5.230。之后使用curl进行测试。具体使用这里不做过多介绍。

扩展学习
https://segmentfault.com/a/1190000002881132
http://noodle.blog.51cto.com/2925423/1794734

秒客网

keepalived.md