Redis 学习笔记4: Redis 3.2.1 集群搭建

时间:2023-03-09 17:37:20

在CenOS 6.7 linux环境下搭建Redis 集群环境

1、下载最新的Redis版本

本人下载的Redis版本是3.2.1版本,下载之后,解压,编译(make); 具体操作可以参考我的博文:Redis 学习笔记1:CentOS 6.7下安装Redis

编译后的redis目录在 /usr/local/redis-3.2.1

2、新建6个目录

[root@itcast01 local]# mkdir 7000 7001 7002 7003 7004 7005

Redis 学习笔记4: Redis 3.2.1 集群搭建

 将 /usr/local/redis-3.2.1目录下的redis文件依次复制到7000~
7005 目录当中。
[root@itcast01 local]# cp -rf  redis-3.2.1/*  7001
[root@itcast01 local]# cp -rf redis-3.2.1/* 7001
[root@itcast01 local]# cp -rf redis-3.2.1/* 7002
[root@itcast01 local]# cp -rf redis-3.2.1/* 7003
[root@itcast01 local]# cp -rf redis-3.2.1/* 7004
[root@itcast01 local]# cp -rf redis-3.2.1/* 7005

3、配置文件redis.conf

新建一个文件,命名为redis.conf ,内容如下:
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
将该配置文件覆盖到到6个目录当中,粘贴覆盖时注意修改port端口,即7000目录下的是7000,7001下的port=7001,以此类推.....
cluster-node-timeout 是集群中各节点相互通讯时,允许"失联"的最大毫秒数,上面的配置为5秒,如果超过5秒某个节点没向其它节点汇报成功,认为该节点挂了。

4、依次启动各个Redis服务

到各个目录下的src目录下,依次执行命令:./redis-serve r    ../redis.conf 
[root@itcast01 src]# ./redis-server ../redis.conf 

Redis 学习笔记4: Redis 3.2.1 集群搭建


以上是7000的启动,可以看到连接的端口是7000,其余目录的redis 服务连接的端口也是各自配置文件配置的端口.

5、安装Redis 集群需要的 Ruby 工具

         虽然步骤4把6个redis server启动成功了,但是彼此之间是完全独立的,需要借助其它工具将其加入cluster,而这个工具就是redis提供的一个名为redis-trib.rb的ruby脚本, 否则接下来的创建cluster将失败。
在联网状态下安装Ruby ,依次执行如下命令即可:
      
[root@itcast01 src]# yum install ruby
[root@itcast01 src]# yum install rubygems
[root@itcast01 src]# gem install redis

6、创建Redis cluster集群

        仍然保持在每个目录的src子目录下,运行下面这段shell脚本,cluster就创建成功了,replicas 1的意思,就是每个节点创建1个副本(即:slave),所以最终的结果,就是后面的127.0.0.1:7000~127.0.0.1:7005中,会有3个会指定成master,而其它3个会指定成slave。
[root@itcast01 src]# ./redis-trib.rb  create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005
注:利用redis-trib创建cluster的操作,只需要一次即可,假设系统关机,把所有6个节点全关闭后,下次重启后,即自动进入cluster模式,不用再次redis-trib.rb
create。
用ps查看redis进程,可以看见每个进程后面添加了cluster字样
Redis 学习笔记4: Redis 3.2.1 集群搭建
如果想知道哪些端口的节点是master,哪些端口的节点是slave,可以用下面的命令:
[root@itcast01 src]# ./redis-trib.rb check 127.0.0.1:7000
输出结果如下:
[root@itcast01 src]# ./redis-trib.rb  check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
S: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
   slots: (0 slots) slave
   replicates 88d693578dd0bdaca9e32422565c624790961bc9
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
   slots: (0 slots) slave
   replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
   slots: (0 slots) slave
   replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
M: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
从上面的输出,可以看出7002、70001、7003 是master,而7000、7004、 7005 是slave ,除了check参数,还有一个常用的参数info.
[root@itcast01 src]# ./redis-trib.rb  info  127.0.0.1:7000
127.0.0.1:7002 (fbcce8fb...) -> 10 keys | 5461 slots | 1 slaves.
127.0.0.1:7001 (20fbccf0...) -> 6 keys | 5462 slots | 1 slaves.
127.0.0.1:7003 (88d69357...) -> 6 keys | 5461 slots | 1 slaves.
[OK] 22 keys in 3 masters.

它会把所有的master信息输出,包括这个master上有几个缓存key,几个salve ,所有master上的key合计,以及平均每个slot上有多少个key.


若想了解更多redis-trib的使用,可以使用如下参数:
[root@itcast01 src]# ./redis-trib.rb help

输出如下:

[root@itcast01 src]# ./redis-trib.rb help
Usage: redis-trib <command> <options> <arguments ...> call host:port command arg arg .. arg
del-node host:port node_id
set-timeout host:port milliseconds
rebalance host:port
--threshold <arg>
--use-empty-masters
--simulate
--auto-weights
--weight <arg>
--pipeline <arg>
--timeout <arg>
help (show this help)
reshard host:port
--slots <arg>
--from <arg>
--to <arg>
--pipeline <arg>
--timeout <arg>
--yes
create host1:port1 ... hostN:portN
--replicas <arg>
info host:port
import host:port
--from <arg>
--copy
--replace
fix host:port
--timeout <arg>
add-node new_host:new_port existing_host:existing_port
--master-id <arg>
--slave
check host:port For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.

上面多次出现slot这个词,下面略做解释下:

Redis 学习笔记4: Redis 3.2.1 集群搭建

如上图,redis-cluster把整个集群的存储空间划分为16384个slot(译为:插槽?),当6个节点分为3主3从时,相当于整个cluster中有3组HA的节点,3个master会平均分摊所有slot,每次向cluster中的key做操作时(比如:读取/写入缓存),redis会对key值做CRC32算法处理,得到一个数值,然后再对16384取模,通过余数判断该缓存项应该落在哪个slot上,

7、redis-cli 客户端操作

[root@itcast01 src]# ./redis-cli -c -h localhost -p 7000

注意 -c 参数,表示进入cluster集群模式,随便添加一个缓存试试。

[root@itcast01 src]# ./redis-cli -c -h localhost -p 7000
localhost:7000> set cc 123
-> Redirected to slot [700] located at 127.0.0.1:7003
OK
127.0.0.1:7003>

注意第二行的输出,表示cc这个缓存通过计算后,落在700这个slot,最终定位在7003这个端口对应的节点上 (注:因为7000是slave,7003是master,只有master才能写入)如果是在7003上面重复上面的操作时,不会出现上面的操作。

8、FailOver测试

先用redis-cli查看当前的主从情况

[root@itcast01 src]# ./redis-trib.rb  check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
S: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots: (0 slots) slave
replicates 88d693578dd0bdaca9e32422565c624790961bc9
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots:0-5460 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

从输出上看出,7000是7003(88d693578dd0bdaca9e32422565c624790961bc9)的slave  ,现在我们人工把7003的redis进程给kill掉,然后观察7000的终端输出:

3342:S 21 Jul 09:43:39.831 * Connecting to MASTER 127.0.0.1:7003
3342:S 21 Jul 09:43:39.831 * MASTER <-> SLAVE sync started
3342:S 21 Jul 09:43:39.831 # Error condition on socket for SYNC: Connection refused
3342:S 21 Jul 09:43:40.135 * Marking node 88d693578dd0bdaca9e32422565c624790961bc9 as failing (quorum reached).
3342:S 21 Jul 09:43:40.135 # Start of election delayed for 720 milliseconds (rank #0, offset 2241).
3342:S 21 Jul 09:43:40.135 # Cluster state changed: fail
3342:S 21 Jul 09:43:40.841 * Connecting to MASTER 127.0.0.1:7003
3342:S 21 Jul 09:43:40.841 * MASTER <-> SLAVE sync started
3342:S 21 Jul 09:43:40.841 # Error condition on socket for SYNC: Connection refused
3342:S 21 Jul 09:43:40.942 # Starting a failover election for epoch 10.
3342:S 21 Jul 09:43:40.965 # Failover election won: I'm the new master.
3342:S 21 Jul 09:43:40.965 # configEpoch set to 10 after successful failover
3342:M 21 Jul 09:43:40.965 * Discarding previously cached master state.
3342:M 21 Jul 09:43:40.966 # Cluster state changed: ok

第6行表明由于70003宕机,cluster状态已经切换到fail状态 ,第5行表示发起枚举 , 第11行表示7000端口对应的节点当选为新的master,用redis-cli check一下:

[root@itcast01 src]# ./redis-trib.rb  check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots:0-5460 (5461 slots) master
0 additional replica(s)
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...

9、cluster扩容

业务规模扩大后,集群需要扩容.下面演示如何再添加两个节点 。跟原先一样,新建两个目录7006,7007,将下载的redis文件各复制一份到这两个目录当中.
并更改其中的redis-conf文件.
[root@itcast01 local]# mkdir 7006 7007
[root@itcast01 local]# cp -rf redis-3.2.1/* 7006
[root@itcast01 local]# cp -rf redis-3.2.1/* 7007
做完这些后,启动7006,7007这两个redis节点,此时这两个新节点与cluster没有任何关系,可以用下面的命令将7006作为master添加到cluster中。
[root@itcast01 src]# ./redis-trib.rb add-node 127.0.0.1:7006 127.0.0.1:7000

注:第一个参数为新节点的“”ip:端口“”,第二个参数为集群中的任一有效的节点。一切顺利的话,输出如下:

[root@itcast01 src]# ./redis-trib.rb add-node 127.0.0.1:7006 127.0.0.1:7000
>>> Adding node 127.0.0.1:7006 to cluster 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots: (0 slots) slave
replicates 9d81b0624b5080e5304165b07c2ef69a011ec28e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 127.0.0.1:7006 to make it join the cluster.
[OK] New node added correctly.

我们用redis-tirb check确认下状态

[root@itcast01 src]#  ./redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc 127.0.0.1:7006
slots: (0 slots) master
0 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots: (0 slots) slave
replicates 9d81b0624b5080e5304165b07c2ef69a011ec28e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

上面的输出已经说明7006是新的master节点了,继续添加新节点。 用下面的命令把7007当成slave加入.

[root@itcast01 src]# ./redis-trib.rb add-node --slave --master-id 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc 127.0.0.1:7007 127.0.0.1:7000

输出如下:

Redis 学习笔记4: Redis 3.2.1 集群搭建

check 一下:
[root@itcast01 src]#  ./redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: bf84939e7e6b066d3d9caf7aae1e1b8e7ca2522c 127.0.0.1:7007
slots: (0 slots) slave
replicates 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc
M: 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc 127.0.0.1:7006
slots: (0 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots: (0 slots) slave
replicates 9d81b0624b5080e5304165b07c2ef69a011ec28e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

说明7007已经是7006的slave.

10、reshared 重新划分slot.

增减新的节点之后,问题就来了,16384个slot已经被其它3组节点分完了,新节点没有slot,没办法存放缓存,所以需要将slot重新分布.
[root@itcast01 src]#  ./redis-trib.rb info 127.0.0.1:7000
127.0.0.1:7000 (9d81b062...) -> 8 keys | 5461 slots | 1 slaves.
127.0.0.1:7002 (fbcce8fb...) -> 11 keys | 5461 slots | 1 slaves.
127.0.0.1:7006 (8e35ebeb...) -> 0 keys | 0 slots | 1 slaves.
127.0.0.1:7001 (20fbccf0...) -> 7 keys | 5462 slots | 1 slaves.
[OK] 26 keys in 4 masters.
0.00 keys per slot on average.

用下面的命令可以重新分配slot

[root@itcast01 src]# ./redis-trib.rb reshard 127.0.0.1:7000

reshard后面的IP:port,只要是在cluster中的有效节点即可。

[root@itcast01 src]# ./redis-trib.rb reshard 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000) .... M: 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc 127.0.0.1:7006
slots: (0 slots) master
1 additional replica(s)
.... >>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1000 #这里输入要移动多少slot
What is the receiving node ID? 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc #这里输入目标节点的ID
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1:all # 将所有node当做源节点
Moving slot 6455 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6456 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6457 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6458 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6459 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6460 from 20fbccf06841f7aa699b97bff72ece2f96599236
Do you want to proceed with the proposed reshard plan (yes/no)? yes #确认执行 ....
Moving slot 12191 from 127.0.0.1:7002 to 127.0.0.1:7006:
Moving slot 12192 from 127.0.0.1:7002 to 127.0.0.1:7006:
Moving slot 12193 from 127.0.0.1:7002 to 127.0.0.1:7006:
Moving slot 12194 from 127.0.0.1:7002 to 127.0.0.1:7006:

注:第一个交互询问,填写多少slot移动时,要好好想想,如果填成16384,则将所有slot都移动到一个固定节点上,会导致更加不均衡!建议每次移动500~1000,这样对线上的影响比较小。

reshard可以多次操作,直到达到期望的分布为止(注:个人觉得redis的reshard这里有点麻烦,要移动多少slot需要人工计算,如果能提供一个参数之类,让16384个slot自动平均分配就好了),调整完成后,可以再看看分布情况:

[root@itcast01 src]#  ./redis-trib.rb info 127.0.0.1:7000
127.0.0.1:7000 (9d81b062...) -> 12 keys | 7005 slots | 1 slaves.
127.0.0.1:7002 (fbcce8fb...) -> 8 keys | 4189 slots | 1 slaves.
127.0.0.1:7006 (8e35ebeb...) -> 3 keys | 1000 slots | 1 slaves.
127.0.0.1:7001 (20fbccf0...) -> 3 keys | 4190 slots | 1 slaves.

11、删除节点

有扩容,就有删除,删除节点的命令如下:
[root@itcast01 src]# ./redis-trib.rb del-node 127.0.0.1:7006 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc

del-node后面的ip:port只要是cluster中有效节点即可,最后一个参数为目标节点的id

注:只有slave节点和空的master节点可以删除,如果master非空,先用reshard把上面的slot移动到其它node后再删除,如果有一组master-slave节点,将master上所有slot移到其它节点,然后将master删除,剩下的slave会另寻他主,变成其它master的slave。
master非空的话,删除报如下错误:
Redis 学习笔记4: Redis 3.2.1 集群搭建
如下是删除slave 节点7007 ,输出如下:
[root@itcast01 src]# ./redis-trib.rb del-node 127.0.0.1:7006 bf84939e7e6b066d3d9caf7aae1e1b8e7ca2522c
>>> Removing node bf84939e7e6b066d3d9caf7aae1e1b8e7ca2522c from cluster 127.0.0.1:7006
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

删除节点后,也会关闭对应的redis服务.