Postgresql流复制切换的时候遇到的一个小问题

时间:2021-09-02 06:49:56

Pg10搭建了流复制主备切换

配置
主库10.10.10.13 pghost4
备库10.10.10.14 pghost5

测试主库出现问题,停止后,备库自动切换成为新主库。老主库现在应该是备库了,启动它验证状态。发现备库启动不起来。通过以下方法,备库启动了。

  1. 新备库通过pg_ctl start启动,出现告警日志。

[[email protected] ~]# su - pg10
[email protected]>pg_ctl start
waiting for server to start....2019-12-20 21:44:10.004 CST [23416] LOG: listening on IPv4 address "0.0.0.0", port 1921
2019-12-20 21:44:10.004 CST [23416] LOG: listening on IPv6 address "::", port 1921
2019-12-20 21:44:10.006 CST [23416] LOG: listening on Unix socket "/tmp/.s.PGSQL.1921"
2019-12-20 21:44:10.014 CST [23417] LOG: database system was shut down in recovery at 2019-12-20 21:33:31 CST
2019-12-20 21:44:10.014 CST [23417] LOG: entering standby mode
2019-12-20 21:44:10.014 CST [23417] FATAL: requested timeline 2 is not a child of this server‘s history
2019-12-20 21:44:10.014 CST [23417] DETAIL: Latest checkpoint is at 0/11000028 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/C000098.
2019-12-20 21:44:10.015 CST [23416] LOG: startup process (PID 23417) exited with exit code 1
2019-12-20 21:44:10.015 CST [23416] LOG: aborting startup due to startup process failure
2019-12-20 21:44:10.016 CST [23416] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

从上边的意思可以看到,新备库pg_wal目录下的00000002.history 这个跟新主库不一致。

[email protected]>ll
total 273M
-rw------- 1 pg10 pg10 16M Dec 18 22:40 000000010000000000000001
-rw------- 1 pg10 pg10 16M Dec 18 23:01 000000010000000000000002
-rw------- 1 pg10 pg10 16M Dec 18 23:02 000000010000000000000003
-rw------- 1 pg10 pg10 16M Dec 18 23:06 000000010000000000000004
-rw------- 1 pg10 pg10 16M Dec 18 23:39 000000010000000000000005
-rw------- 1 pg10 pg10 16M Dec 18 23:51 000000010000000000000006
-rw------- 1 pg10 pg10 16M Dec 18 23:55 000000010000000000000007
-rw------- 1 pg10 pg10 16M Dec 18 23:58 000000010000000000000008
-rw------- 1 pg10 pg10 16M Dec 19 00:15 000000010000000000000009
-rw------- 1 pg10 pg10 16M Dec 19 00:15 00000001000000000000000A
-rw------- 1 pg10 pg10 16M Dec 19 00:27 00000001000000000000000B
-rw------- 1 pg10 pg10 16M Dec 19 00:29 00000001000000000000000C
-rw------- 1 pg10 pg10 16M Dec 19 00:34 00000001000000000000000D
-rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000E
-rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000F
-rw------- 1 pg10 pg10 302 Dec 19 00:36 00000001000000000000000F.00000028.backup
-rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000010
-rw------- 1 pg10 pg10 16M Dec 20 21:33 000000010000000000000011
-rw------- 1 pg10 pg10 41 Dec 19 00:30 00000002.history
drwx------ 2 pg10 pg10 4.0K Dec 20 21:31 archive_status
[email protected]>more 00000002.history
1 0/C000098 no recovery target specified
[email protected]>

然后看新主库该路径下的目录

[email protected]>ll
total 132K
-rw------- 1 pg10 pg10 206 Dec 19 00:36 backup_label.old
drwx------ 5 pg10 pg10 4.0K Dec 19 00:36 base
drwx------ 2 pg10 pg10 4.0K Dec 20 21:27 global
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_commit_ts
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_dynshmem
-rw------- 1 pg10 pg10 4.6K Dec 19 00:36 pg_hba.conf
-rw------- 1 pg10 pg10 1.6K Dec 19 00:36 pg_ident.conf
drwx------ 4 pg10 pg10 4.0K Dec 19 00:38 pg_logical
drwx------ 4 pg10 pg10 4.0K Dec 19 00:36 pg_multixact
drwx------ 2 pg10 pg10 4.0K Dec 19 00:37 pg_notify
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_replslot
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_serial
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_snapshots
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_stat
drwx------ 2 pg10 pg10 4.0K Dec 20 21:48 pg_stat_tmp
drwx------ 2 pg10 pg10 4.0K Dec 19 00:38 pg_subtrans
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_tblspc
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_twophase
-rw------- 1 pg10 pg10 3 Dec 19 00:36 PG_VERSION
drwx------ 3 pg10 pg10 4.0K Dec 19 00:38 pg_wal
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_xact
-rw------- 1 pg10 pg10 88 Dec 19 00:36 postgresql.auto.conf
-rw------- 1 pg10 pg10 23K Dec 19 00:36 postgresql.conf
-rw------- 1 pg10 pg10 31 Dec 19 00:37 postmaster.opts
-rw------- 1 pg10 pg10 74 Dec 19 00:38 postmaster.pid
-rw-r--r-- 1 pg10 pg10 5.8K Dec 19 00:37 recovery.done
[email protected]>cd pg_wal/br/>[email protected]>ll
total 65M
-rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000F
-rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000010
-rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000011.partial
-rw------- 1 pg10 pg10 16M Dec 19 00:38 000000020000000000000011
-rw------- 1 pg10 pg10 42 Dec 19 00:38 00000002.history
drwx------ 2 pg10 pg10 4.0K Dec 19 00:38 archive_status
br/>[email protected]>pwd
/data/pg10/data/pg_wal
[email protected]>more 00000002.history
1 0/11000098 no recovery target specified

将新备库的00000002.history 备份,然后删除。
将新主库的传递到新备库。
然后启动。
[email protected]>scp 00000002.history 10.10.10.13:/data/pg10/data/pg_wal/
[email protected]‘s password:
00000002.history 100% 42 0.0KB/s 00:00
[email protected]>

[email protected]>pg_ctl start
waiting for server to start....2019-12-20 21:51:25.142 CST [23486] LOG: listening on IPv4 address "0.0.0.0", port 1921
2019-12-20 21:51:25.142 CST [23486] LOG: listening on IPv6 address "::", port 1921
2019-12-20 21:51:25.144 CST [23486] LOG: listening on Unix socket "/tmp/.s.PGSQL.1921"
2019-12-20 21:51:25.155 CST [23487] LOG: database system was shut down in recovery at 2019-12-20 21:33:31 CST
2019-12-20 21:51:25.155 CST [23487] LOG: entering standby mode
2019-12-20 21:51:25.171 CST [23487] LOG: consistent recovery state reached at 0/11000098
2019-12-20 21:51:25.171 CST [23487] LOG: invalid record length at 0/11000098: wanted 24, got 0
2019-12-20 21:51:25.171 CST [23486] LOG: database system is ready to accept read only connections
2019-12-20 21:51:25.179 CST [23491] LOG: started streaming WAL from primary at 0/11000000 on timeline 2
2019-12-20 21:51:25.225 CST [23487] LOG: redo starts at 0/11000098
done
server started
[email protected]>

发现已经启动。

[email protected]>pg_controldata|grep cluster
Database cluster state: in archive recovery
[email protected]>

可以看到新备库处于备库模式。