Keepalived,即存活检测机制,是 Linux 下一个轻量级别的高可用解决方案;起初针对 LVS 进行研发,通过心跳检测检查系统中各个服务节点的健康状态,支持故障自动切换。

部署

1
yum install -y keepalive

修改配置文件 vi /etc/keepalived/keepalived.conf,state 可以设置为:MASTER 或 BACKUP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
global_defs {
router_id ka01 #标识信息
}

vrrp_instance VI_1 {
state MASTER
priority 150 #优先级
interface eth0 #绑定的网卡
virtual_router_id 50 #同一个虚拟的路由
advert_int 1 #心跳的间隔时间
authentication {
auth_type PASS #两个主机之间的密语
auth_pass 1111 #心跳密码
}
virtual_ipaddress {
10.0.0.3 #虚拟IP地址(可以绑定多个虚拟IP地址)
}
}
1
2
3
systemctl  enable keepalive
systemctl start keepalive
systemctl stop keepalive

方式

Keepalive 高可用分为:抢占式和非抢占式

抢占式(默认)

BACKUP挂掉,BACKUP上台,MASTER重新启动则将 IP 抢占过去。

非抢占式

两台均为BACKUP,在优先级上做区分,如MASTER挂掉,BACKUP上台,则BACKUP变成MASTER,MASTER变为BACKUP。

  • 两个节点的state均为BACKUP(官方建议)
  • 两个节点都在vrrp_instance中添加nopreempt
  • 其中一个节点的优先级要高于另外一个节点

PS.两台服务器角色都启用了 nopreempt 后,必须修改角色状态统一为 BACKUP,唯一的区别就是优先级不同。

1
2
3
4
5
6
7
8
9
10
11
12
13
#Master
vrrp_instance VI_1 {
state BACKUP
priority 150
nopreempt
}

#Backup
vrrp_instance VI_1 {
state BACKUP
priority 100
nopreempt
}

故障

脑裂

当两台高可用服务器在指定的时间内,无法互相检测到对方心跳而各自启动故障转移功能,取得了资源以及服务的所有权,而此时的两台高可用服务器对都还活着并作正常运行,这样就会导致同一个服务在两端同时启动而发生冲突的严重问题,最严重的就是两台主机同时占用一个VIP的地址(类似双端导入概念),当用户写入数据的时候可能会分别写入到两端,这样可能会导致服务器两端的数据不一致或造成数据的丢失,这种情况就称为裂脑,也有的人称之为分区集群或者大脑垂直分隔。

  • 服务器网线松动等网络故障
  • 服务器硬件故障发生损坏现象而奔溃
  • 主备服务器都开启了firewalld防火墙

    解决方法

    如果 Nginx 宕机,会导致用户请求失败,但 Keepalived 并不会进行地址漂移;所以需要编写一个脚本检测 Nginx 的存活状态,如果不存活则 kill nginx 和 keepalived
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
vim check_nginx.sh
#!/bin/sh
nginxpid=$(ps -C nginx --no-header|wc -l)
#1.判断 Nginx 是否存活,如果不存活则尝试启动 Nginx
if [ $nginxpid -eq 0 ];then
systemctl start nginx
sleep 3
#2.等待 3 秒后再次获取一次 Nginx 状态
nginxpid=$(ps -C nginx --no-header|wc -l)
#3.再次进行判断,如 Nginx 还不存活则停止 Keepalived,让地址进行漂移,并退出脚本
if [ $nginxpid -eq 0 ];then
systemctl stop keepalived
fi
fi
#添加执行权限
chmod +x check_ nginx.sh

配置 Keepalived 使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
vi /etc/keepalived/keepalived.conf
global_defs {
router_id 01
}

#定义脚本所在的位置,以及执行时间
vrrp_script check_nginx {
script "/root/check_nginx.sh"
interval 5
}

vrrp_instance VI_1 {
state BACKUP
priority 150
nopreempt
interface eth0
virtual_router_id 50
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.3
}

#调用脚本
track_script {
check_nginx
}
}

Nginx

至少两台Nginx 服务器、一个 VIP(虚拟 IP),服务器都安装好 Keepalived,并设置好 VIP;当主 Nginx 发生故障时,VIP 自动换绑到备用 Nginx,并运行相应的脚本。

编译安装

[[Nginx 入门配置]]

YUM 安装

1
2
3
4
5
yum install -y nginx
yum install -y keepalived
rpm -q -a keepalived
#开机启动
systemctl enable keepalived

Keepalived

MASTER 和 BACKUP 各一台,编译安装 Keepalived,因为设置绑定网卡等操作,需要使用 root 用户

1
2
3
4
tar -zxvf keepalived-2.2.2.tar.gz -C /root && cd /root/keepalived-2.2.2
./configure
#*** WARNING - this build will not support IPVS with IPv6. Please install libnl/libnl-3 dev libraries to support IPv6 with IPVS.
make && make install

配置文件

安装完成以后,拷贝并修改配置文件

1
2
3
4
5
6
7
#修改配置之前查看本机 IP 所在网卡
ip addr
mkdir /etc/keepalived
vi /etc/keepalived/keepalived.conf
# 删除以下两行,否则 VIP 绑定失败
vrrp_skip_check_adv_addr
vrrp_strict

PS.通过配置 priority 设置优先级,通过增加 nopreempt 配置非抢占式,vrrp_script 设置切换机制

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
! Configuration File for keepalived

global_defs {
router_id keep_xxx
}

vrrp_script check_nginx {
script "/etc/keepalived/nginx_check.sh"
interval 2
weight -20
}

vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}

track_script {
check_nginx
}

virtual_ipaddress {
192.168.200.16
}
}

检查脚本

创建 Nginx 检查脚本:vi /etc/keepalived/nginx_check.sh

1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash

# 判断 nginx 是否宕机并尝试重启
if [ `ps -C nginx --no-header |wc -l` -eq 0 ];then
/app/midware/nginx/sbin/nginx
# 等待 5s 再次检查 nginx,如果没能启动成功,则停止 keepalived 切换备机
sleep 5
if [ `ps -C nginx --no-header |wc -l` -eq 0 ];then
killall keepalived
fi
fi

增加可执行权限:chmod +x /etc/keepalived/nginx_check.sh

服务命令

1
2
cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
cp /usr/local/sbin/keepalived /usr/sbin/

创建系统服务文件:/etc/init.d/keepalived,并赋予执行权限:chmod +x /etc/init.d/keepalived

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#!/bin/sh
#
# Startup script for the Keepalived daemon
#
# processname: keepalived
# pidfile: /var/run/keepalived.pid
# config: /etc/keepalived/keepalived.conf
# chkconfig: - 21 79
# description: Start and stop Keepalived

# Source function library
. /etc/rc.d/init.d/functions

# Source configuration file (we set KEEPALIVED_OPTIONS there)
. /etc/sysconfig/keepalived

RETVAL=0

prog="keepalived"

start() {
echo -n $"Starting $prog: "
daemon keepalived ${KEEPALIVED_OPTIONS}
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog
}

stop() {
echo -n $"Stopping $prog: "
killproc keepalived
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$prog
}

reload() {
echo -n $"Reloading $prog: "
killproc keepalived -1
RETVAL=$?
echo
}

# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
reload)
reload
;;
restart)
stop
start
;;
condrestart)
if [ -f /var/lock/subsys/$prog ]; then
stop
start
fi
;;
status)
status keepalived
RETVAL=$?
;;
*)
echo "Usage: $0 {start|stop|reload|restart|condrestart|status}"
RETVAL=1
esac

exit $RETVAL

然后 systemctl daemon-reload 重载

1
2
3
4
5
6
7
8
#查看服务状态
systemctl status keepalived
#启动服务
systemctl start keepalived
#重启服务
systemctl restart keepalived
#停止服务
systemctl stop keepalived