Oracle 安装 RAC 11.2.0.4 centos7.4 -udev磁盘绑定/执行root脚本报错

时间:2021-08-28 06:58:26

在centos 7.4上安装oracle rac 11.2.0.4 报错及相关解决

$ cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core)

1 udev绑定共享磁盘

之前在centos 6上面的命令/sbin/scsi_id 在7上面没有,替换成/usr/lib/udev/scsi_id

--没有分区
for i in b c d e f g;
do
echo "KERNEL==\"sd*\", SUBSYSTEM==\"block\", PROGRAM==\"/usr/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/\$name\", RESULT==\"`/usr/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/sd$i`\", NAME=\"asm-disk$i\", OWNER=\"grid\", GROUP=\"asmadmin\", MODE=\"0660\""      >> /etc/udev/rules.d/99-oracle-asmdevices.rules
done
[root@rac01 ~]# cat /etc/udev/rules.d/99-oracle-asmdevices.rules
KERNEL=="sd*", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="36000c29ea85262d4a23086fbce428b09", NAME="asm-diskb", OWNER="grid", GROUP="asmadmin", MODE="0660"

6和7有些区别,不然会报错

SYMLINK+=\"asm-disk$i\"

NAME=\"asm-disk$i\"

[root@rac01 ~]# ls -l /dev/asm*

Jul 31 16:31:04 rac01 systemd-udevd[664]: unknown key 'BUS' in /etc/udev/rules.d/99-oracle-asmdevices.rules:11
Jul 31 16:31:04 rac01 systemd-udevd[664]: invalid rule '/etc/udev/rules.d/99-oracle-asmdevices.rules:11'
Jul 31 16:31:04 rac01 systemd-udevd[664]: unknown key 'BUS' in /etc/udev/rules.d/99-oracle-asmdevices.rules:12
Jul 31 16:31:04 rac01 systemd-udevd[664]: invalid rule '/etc/udev/rules.d/99-oracle-asmdevices.rules:12'
Jul 31 16:44:37 rac01 systemd-udevd[7121]: NAME="asm-diskb" ignored, kernel device nodes can not be renamed; please fix it in /etc/udev/rules.d/99-oracle-asmdevices.rules:1
Jul 31 16:44:41 rac01 systemd-udevd[7133]: NAME="asm-diskc" ignored, kernel device nodes can not be renamed; please fix it in /etc/udev/rules.d/99-oracle-asmdevices.rules:2

重新加载分区

/sbin/partprobe /dev/sdb
[root@rac01 ~]# /usr/lib/udev/scsi_id -g -u /dev/sdb
36000c29ea85262d4a23086fbce428b09

启动udev

/usr/sbin/udevadm control --reload-rules
systemctl status systemd-udevd.service
systemctl enable systemd-udevd.service
[root@rac01 ~]# /sbin/udevadm trigger --type=devices --action=change
[root@rac01 ~]# ll /dev/asm-disk*
lrwxrwxrwx. 1 root root 3 Jul 31 16:57 /dev/asm-diskb -> sdb
lrwxrwxrwx. 1 root root 3 Jul 31 16:57 /dev/asm-diskc -> sdc
lrwxrwxrwx. 1 root root 3 Jul 31 16:57 /dev/asm-diskd -> sdd
lrwxrwxrwx. 1 root root 3 Jul 31 16:57 /dev/asm-diske -> sde
lrwxrwxrwx. 1 root root 3 Jul 31 16:57 /dev/asm-diskf -> sdf
lrwxrwxrwx. 1 root root 3 Jul 31 16:57 /dev/asm-diskg -> sdg

2 grid安装时候,执行root脚本报错

--节点1

[root@rac01 ~]# /u01/app/oraInventory/orainstRoot.sh 
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
[root@rac01 ~]# /u01/app/11.2.0/grid/root.sh 
Performing root user operation for Oracle 11g 

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]: 
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
  root wallet
  root wallet cert
  root cert export
  peer wallet
  profile reader wallet
  pa wallet
  peer wallet keys
  pa wallet keys
  peer cert request
  pa cert request
  peer cert
  pa cert
  peer root cert TP
  profile reader root cert TP
  pa root cert TP
  peer pa cert TP
  pa peer cert TP
  profile reader pa cert TP
  profile reader peer cert TP
  peer user cert
  pa user cert
Adding Clusterware entries to inittab
ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow: 
2019-08-01 09:35:59.951: 
[client(14411)]CRS-2101:The OLR was formatted using version 3.

^CINT at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 1446.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
Oracle root script execution aborted!

一开始以为是共享磁盘权限问题

[root@rac01 ~]# ll /dev/asm-disk*
lrwxrwxrwx 1 root root 3 Aug  1 09:07 /dev/asm-diskb -> sdb
##修改
[root@rac01 ~]# chown grid:asmadmin /dev/asm-disk*
##并没有作用

参考https://blog.csdn.net/DBAngelica/article/details/85002591

[root@rac01 ~]# touch /usr/lib/systemd/system/ohasd.service
[root@rac01 ~]# vim /usr/lib/systemd/system/ohasd.service
[Unit]
Description=Oracle High Availability Services
After=syslog.target

[Service]
ExecStart=/etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple
Restart=always

[Install]
WantedBy=multi-user.target
[root@rac01 ~]# systemctl daemon-reload
[root@rac01 ~]# systemctl enable ohasd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/ohasd.service to /usr/lib/systemd/system/ohasd.service.
[root@rac01 ~]# systemctl start ohasd.service
[root@rac01 ~]# systemctl status ohasd.service
● ohasd.service - Oracle High Availability Services
   Loaded: loaded (/usr/lib/systemd/system/ohasd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-08-01 10:53:38 CST; 6s ago
 Main PID: 18621 (init.ohasd)
   CGroup: /system.slice/ohasd.service
           └─18621 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple

Aug 01 10:53:38 rac01 systemd[1]: Started Oracle High Availability Services.
Aug 01 10:53:38 rac01 systemd[1]: Starting Oracle High Availability Services...
[root@rac01 ~]# /u01/app/11.2.0/grid/root.sh
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   61e053dbaca94f40bfa468e31c9c927f (/dev/asm-diskb) [OCR]
 2. ONLINE   6b25d06268b84fe9bfc6125298d94018 (/dev/asm-diskd) [OCR]
 3. ONLINE   b1fd0f59a3474f92bf0b2d3344fe91cc (/dev/asm-diskc) [OCR]
Located 3 voting disk(s).
CRS-2672: Attempting to start 'ora.asm' on 'rac01'
CRS-2676: Start of 'ora.asm' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.OCR.dg' on 'rac01'
CRS-2676: Start of 'ora.OCR.dg' on 'rac01' succeeded
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

--节点2执行报错

注意: 为了避免其余节点遇到这种报错,可以在root.sh执行过程中,待/etc/init.d/目录下生成了init.ohasd 文件后执行systemctl start ohasd.service 启动ohasd服务即可。
若没有/etc/init.d/init.ohasd文件 systemctl start ohasd.service 则会启动失败。
[root@rac02 ~]# systemctl status ohasd.service
● ohasd.service - Oracle High Availability Services
   Loaded: loaded (/usr/lib/systemd/system/ohasd.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2019-08-01 11:03:58 CST; 3s ago
  Process: 22754 ExecStart=/etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple (code=exited, status=203/EXEC)
 Main PID: 22754 (code=exited, status=203/EXEC)

Aug 01 11:03:57 rac02 systemd[1]: Unit ohasd.service entered failed state.
Aug 01 11:03:57 rac02 systemd[1]: ohasd.service failed.
Aug 01 11:03:58 rac02 systemd[1]: ohasd.service holdoff time over, scheduling restart.
Aug 01 11:03:58 rac02 systemd[1]: start request repeated too quickly for ohasd.service
Aug 01 11:03:58 rac02 systemd[1]: Failed to start Oracle High Availability Services.
Aug 01 11:03:58 rac02 systemd[1]: Unit ohasd.service entered failed state.
Aug 01 11:03:58 rac02 systemd[1]: ohasd.service failed.

错误日志

[root@rac02 ~]# ll /etc/init.d/init.ohasd
ls: cannot access /etc/init.d/init.ohasd: No such file or directory
[root@rac02 ~]# ll /etc/init.d/init.ohasd
-rwxr-xr-x 1 root root 8782 Aug  1 11:06 /etc/init.d/init.ohasd
[root@rac02 ~]# systemctl start ohasd.service
[root@rac02 ~]# systemctl status ohasd.service
● ohasd.service - Oracle High Availability Services
   Loaded: loaded (/usr/lib/systemd/system/ohasd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-08-01 11:06:20 CST; 4s ago
 Main PID: 24186 (init.ohasd)
   CGroup: /system.slice/ohasd.service
           ├─24186 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple
           └─24211 /bin/sleep 10

Aug 01 11:06:20 rac02 systemd[1]: Started Oracle High Availability Services.
Aug 01 11:06:20 rac02 systemd[1]: Starting Oracle High Availability Services...

[root@rac01 rac01]# tail -n 100 -f /u01/app/11.2.0/grid/log/rac01/alertrac01.log
2019-08-01 14:16:30.453: 
[cssd(21789)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac01 rac02 .

[root@rac02 ~]# tail -n 100 -f /u01/app/11.2.0/grid/log/rac02/alertrac02.log 
The execution of the script is complete.
2019-08-01 14:15:48.037: 
[ohasd(3604)]CRS-2112:The OLR service started on node rac02.
2019-08-01 14:15:48.059: 
[ohasd(3604)]CRS-1301:Oracle High Availability Service started on node rac02.
2019-08-01 14:15:48.060: 
[ohasd(3604)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2019-08-01 14:15:48.545: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(6497)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0/grid/log/rac02/agent/ohasd/oraagent_grid/oraagent_grid.log"
2019-08-01 14:15:51.622: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(6501)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 
2019-08-01 14:15:53.823: 
[gpnpd(6592)]CRS-2328:GPNPD started on node rac02. 
2019-08-01 14:15:56.234: 
[cssd(6658)]CRS-1713:CSSD daemon is started in clustered mode
2019-08-01 14:15:58.006: 
[ohasd(3604)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2019-08-01 14:15:58.006: 
[ohasd(3604)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2019-08-01 14:16:21.832: 
[cssd(6658)]CRS-1707:Lease acquisition for node rac02 number 2 completed
2019-08-01 14:16:23.138: 
[cssd(6658)]CRS-1605:CSSD voting file is online: /dev/asm-diskc; details in /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log.
2019-08-01 14:16:23.140: 
[cssd(6658)]CRS-1605:CSSD voting file is online: /dev/asm-diskd; details in /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log.
2019-08-01 14:16:23.146: 
[cssd(6658)]CRS-1605:CSSD voting file is online: /dev/asm-diskb; details in /u01/app/11.2.0/grid/log/rac02/cssd/ocssd.log.
2019-08-01 14:16:29.466: 
[cssd(6658)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac01 rac02 .
2019-08-01 14:16:31.434: 
[ctssd(7290)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac01.
2019-08-01 14:16:31.435: 
[ctssd(7290)]CRS-2401:The Cluster Time Synchronization Service started on host rac02.
2019-08-01 14:16:33.170: 
[ohasd(3604)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2019-08-01 14:16:33.171: 
[ohasd(3604)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2019-08-01 14:17:30.167: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(6603)]CRS-5818:Aborted command 'start' for resource 'ora.ctssd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/rac02/agent/ohasd/orarootagent_root/orarootagent_root.log.
2019-08-01 14:17:34.169: 
[ohasd(3604)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.ctssd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/rac02/ohasd/ohasd.log.
2019-08-01 14:17:34.183: 
[ohasd(3604)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2019-08-01 14:17:34.183: 
[ohasd(3604)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
2019-08-01 14:17:34.183: 
[ohasd(3604)]CRS-2807:Resource 'ora.evmd' failed to start automatically.
2019-08-01 14:17:51.734: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(6568)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0/grid/log/rac02/agent/ohasd/oraagent_grid/oraagent_grid.log"
2019-08-01 14:19:04.174: 
[ohasd(3604)]CRS-2765:Resource 'ora.ctssd' has failed on server 'rac02'.
2019-08-01 14:19:06.776: 
[ctssd(8408)]CRS-2401:The Cluster Time Synchronization Service started on host rac02.
2019-08-01 14:19:06.776: 
[ctssd(8408)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac01.
2019-08-01 14:19:07.533: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(6568)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0/grid/log/rac02/agent/ohasd/oraagent_grid/oraagent_grid.log"
2019-08-01 14:19:13.266: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(6568)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0/grid/log/rac02/agent/ohasd/oraagent_grid/oraagent_grid.log"
2019-08-01 14:19:36.864: 
[crsd(8918)]CRS-1012:The OCR service started on node rac02.
/u01/app/11.2.0/grid/log/rac01/agent/ohasd/oraagent_grid/oraagent_grid.log
019-08-01 14:40:53.671: [ora.gipcd][4109874944]{0:0:156} [check] clsdmc_respget return: status=0, ecode=0
2019-08-01 14:41:18.893: [ CRSCOMM][4152755968] IpcC: IPC client connection 18 to member 0 has been removed
2019-08-01 14:41:18.893: [CLSFRAME][4152755968] Removing IPC Member:{Relative|Node:0|Process:0|Type:2}
2019-08-01 14:41:18.893: [CLSFRAME][4152755968] Disconnected from OHASD:rac01 process: {Relative|Node:0|Process:0|Type:2}
2019-08-01 14:41:18.894: [   AGENT][4142249728]{0:13:10} {0:13:10} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-08-01 14:41:18.894: [    AGFW][4142249728]{0:13:10} Agent is exiting with exit code: 1

/u01/app/11.2.0/grid/log/rac01/agent/ohasd/oracssdagent_root/oracssdagent_root.log
2019-08-01 15:02:53.928: [ USRTHRD][1509222144]{0:19:163} clsnomon_HangExit: no member
2019-08-01 15:02:58.928: [ USRTHRD][1509222144]{0:19:163} clsnomon_HangExit: no member
2019-08-01 15:03:03.928: [ USRTHRD][1509222144]{0:19:163} clsnomon_HangExit: no member
2019-08-01 15:03:08.929: [ USRTHRD][1509222144]{0:19:163} clsnomon_HangExit: no member

在节点2执行了节点1同样的方法,但root.sh始终执行不成功。

包括手动执行了

# /bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1

依然不行。。。