利用rman duplicate重建oracle dataguard standby数据库

时间:2023-03-08 23:27:36
利用rman duplicate重建oracle dataguard standby数据库

问题背景

  • 适用情况:

    操作系统: redhat 6.5

    数据库: oracle 11g r2

    问题描述: failover后原主库无法恢复和启动或者丢失主备关系

  • 优点
  1. 不需要对primary数据库停机
  2. 执行简单
  • 实施前准备工作

  1.测试dumplicate

  2.测试环境数据库利用dumplicate重建stanby数据库

实施步骤

  • 备份新主库

注意备份脚本,应该备份到服务器的本地磁盘而不是带库。

rman_backup.sh备份本地脚本:

#!/bin/sh

#oracle environment...........

export ORACLE_BASE=/data/oracle/app

export ORACLE_HOME=$ORACLE_BASE/oracle/product/11.2.0/dbhome_1

export ORACLE_SID=orcl_stby

export PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin

export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib

export NLS_LANG=AMERICAN_AMERICA.AL32UTF8

day=`date -u +%Y%m%d `

cd /data/bak/rman_backup

rman target / nocatalog log=/data/bak/rman_backup/rman_backup$day.log <<EOF

crosscheck archivelog all;

crosscheck backup;

delete noprompt expired archivelog all;

delete noprompt expired backup;

run{ allocate channel c1 type disk;

allocate channel c2 type disk;

backup database format '/data/bak/rman_backup/%d_full_%T%s%p.bck';

sql "alter system archive log current";

backup archivelog all format '/data/bak/rman_backup/%d_arc_%T%s%p.bck';

backup current controlfile format = '/data/bak/rman_backup/controlfile%T%s%p.bck';

release channel c1;

release channel c2;

}

exit;

EOF

  • 删除原主库

这一步以后,后面步骤都约定改原主库叫“备库”,新主库叫“主库”。

1.关闭数据库;

SQL>shutdown immediate;

2.以restrict方式重新打开数据库,并启动到mount状态;

sqlplus / as sysdba

SQL>startup restrict mount;    --> # 只有拥有sysdba角色权限的用户才可以登录数据库,普通用户则不可以(防止有其他用户对数据库进行访问)

3.再次确认数据库名,以防止误删除,本次要删除的是orcl;

SQL>select name from v$database;

4.使用drop database语句;

SQL>drop database;  --> # (10g及以后版本适用)
                    # 它只删除了数据库文件(控制文件、数据文件、日志文件、spfile),但并不删除$ORACLE_BASE/admin/$ORACLE_SID目录下的文件 也不会删除初始化参数文件及密码文件,归档日志也不会被删掉。

SQL> shutdown immediate;

ORA-01109: database not open

Database dismounted.

ORACLE instance shut down.

SQL> exit

Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

[oracle@uatecsdb ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 23 14:52:03 2017

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup restrict mount;

ORACLE instance started.

Total System Global Area 6747725824 bytes

Fixed Size                 2213976 bytes

Variable Size        5100275624 bytes

Database Buffers  1610612736 bytes

Redo Buffers           34623488 bytes

Database mounted.

SQL> select name from v$database;

NAME

---------

ORCL

SQL> drop database;

Database dropped.

Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> exit

[oracle@uatecsdb ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 23 14:56:20 2017

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL>

  • 备库准备startup nomount

准备pfile配置文件,最好是原来构建DataGuard时创建的的pfile。

注意把pfile改成init$ORACLE_SID.ora的格式(initorcl.ora),并且放到/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/目录下:

SQL>startup nomount;

  • rman连接主库和备库

执行RMAN连接前,先确认以下几项是否有问题:

1.防火墙关闭

2.tnsnames.ora,各自服务器须能监听对方

3.sys密码最好一致

4.db_file_name_convert和log_file_name_convert,若目录不一致,pfile需要制定这两个参数

由于之前都构建过DataGuard所以,这几项在生产环境不受影响.

rman target sys/yourpassword@orcl_stby auxiliary sys/yourpassword@orcl

使用duplicate命令重建standby数据库

因为主备库的路径相同,使用下面命令:

RMAN>duplicate target database for standby from active database nofilenamecheck;

  • 验证数据库

打开备库:

SQL>alter database open;    #这一步可能报错,暂时不管,最后再测试是否可以open

SQL>CREATE SPFILE FROM PFILE='/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/initorcl.ora';

SQL>select status from v$instance;

SQL>select open_mode from v$database;

查看主库:

SQL>select status from v$instance;

SQL>select open_mode from v$database;

查看GAP_STATUS

SQL>SELECT STATUS, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID = 2;

如果状态是DEFER

SQL>ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2='ENABLE' SCOPE=BOTH;

启动实时同步:

SQL>alter database recover managed standby database using current logfile disconnect from session;

SQL>select process,thread#,status from v$managed_standby;

SQL>SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG;

SQL>SELECT SWITCHOVER_STATUS FROM V$DATABASE;

  • 恢复DMGRL关系

DGMGRL>show database verbose orcl;

查询数据库状态还是Database Status:SHUTDOWN

登录备库,启动dg_broker:

SQL> show parameter dg_broker_start;

NAME                                 TYPE                   VALUE

------------------------------------ ---------------------- ------------------------------

dg_broker_start                      boolean                FALSE

SQL> alter system set dg_broker_start = true scope=both;

System altered.

SQL>!ps -ef|grep dmon

利用rman duplicate重建oracle dataguard standby数据库

  • 遗留疑问

本次测试仅仅持续了3个多小时,导致新归档了15个归档日志,duplicat完成后,启用LOG_ARCHIVE_DEST_STATE_2,只恢复了6个,虽然LOG各项指标检查没有问题,数据库也可以open,但是数据是否会存在一致性问题?

生产环境因为一个小时一个归档,整个操作来说3个小时就可以完成,所以倒不用担心日志缺失的问题。

  • 生产过程正式实施新发现和解决的问题

1.生产实施的时候发现主库log_archive_dest_2状态是INACTIVE,应该是上回failover后没有完整完成,所以导致主库丢失了log_archive_dest_2

SQL> SELECT STATUS, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID = 2;

STATUS     GAP_STATUS

--------- ------------------------

INACTIVE

然后执行以下SQL,补回log_archive_dest_2参数即可:

alter system set log_archive_dest_2='SERVICE=orcl LGWR SYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=orcl' scope=both;

利用rman duplicate重建oracle dataguard standby数据库

gap状态变为RESOLVABLE GAP,切换日志后,即变为NO GAP。

2.BROKER主备数据库状态配置都不对,需要重建BROKER

a.删除原来的configuration

DISABLE FAST_START FAILOVER FORCE;

(1)观察器上

disable configuration;

remove database orcl;

remove database orcl_stby;

remove configuration;

(2)在两个库上

alter system set dg_broker_start = false scope=both;

show parameter broker;

重命名/data/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/下的

dr1orcl_stby.dat和dr2orcl_stby.dat文件

(3)在两个库上

alter system set dg_broker_start = true scope=both;

b.重建configuration

DGMGRL> create configuration DG_orcl as primary database is orcl_stby connect identifier is orcl_stby;

DGMGRL> add database orcl as connect identifier is orcl maintained as physical;

DGMGRL> show database orcl_stby;

DGMGRL> show database orcl;

DGMGRL> show database verbose orcl_stby;

DGMGRL> edit database 'orcl' set property 'ArchiveLagTarget'='0';

DGMGRL> edit database 'orcl' set property 'LogArchiveMinSucceedDest'='1';

DGMGRL> edit database 'orcl_stby' set property 'DelayMins'='0';

DGMGRL> edit database 'orcl' set property 'DelayMins'='0';

DGMGRL> enable configuration;

DGMGRL> show configuration;

c.启用FAST_START FAILOVER

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverLagLimit=1800;

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = 15;

GMGRL> EDIT DATABASE orcl_stby SET PROPERTY FastStartFailoverTarget='orcl';

Property "faststartfailovertarget" updated

DGMGRL> EDIT DATABASE orcl SET PROPERTY FastStartFailoverTarget='orcl_stby';

Property "faststartfailovertarget" updated

SHOW DATABASE ORCL LOGXPTMODE

SHOW DATABASE ORCL_STBY LOGXPTMODE

EDIT DATABASE ORCL SET PROPERTY LOGXPTMODE='SYNC';

EDIT DATABASE ORCL_STBY SET PROPERTY LOGXPTMODE='SYNC';

EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;

ENABLE FAST_START FAILOVER;

SHOW FAST_START FAILOVER;

SHOW CONFIGURATION VERBOSE;