10g RAC administration

See OCFS Oracle Cluster Filesystem, ASM, TNSnames configuration,

Oracle Database 11g New Features, Raw devices,Resource Manager, Dbca

See http://www.oracle.com/technology/support/metalink/index.html to view certification matrix

This is just a draft of basic RAC 10g administration

RAC benefit and characteristics

- does not protect from human errrors

- increased availabilty from node/instance failure

- speed up parallel DSS queries

- no speed up parallel OLTP processes

- no availability increase on data failures

- no availability increase on network failures

- no availability increase on release upgrades

- no scalability increased for applications workloads in all cases

RAC tuning - After migration to RAC test:

 - Interconnect latency

 - Instance recovery time

 - Application strongly relying on table truncates, full scan tables, sequences and

   non-sequences key generation,global context variables

RAC specific background processes for the database instance

Cluster Synchronization Service (CSS)

  ocssd daemon, manages cluster configuration

Cluster Ready Services (CRS)

  manages resources(listeners, VIPs, Global Service Daemon GSD, Oracle Notification

  Service ONS)crsd daemon backup the OCR every for hours, configuration is stored in

  OCR

Event Manager (EVM)

  evmd daemon, publish events

LMSn coordinate block updates

LMON global enqueue for shared locks

LMDn manages requests for global enqueues

LCK0 handle resources not requiring Cache Fusion

DIAG collect diagnostic info

GSD 9i is not compatible with 10g

FAN Fast Application Notification

- Must connect using service

Logged to:

&ORA_CRS_HOME/racg/dump

$ORA_CRS_HOHE/log/<nodename>/racg 

<event_type> VERSION=<n.n>

service=<service_namne.db_domain_name>

[database=<db_unique_name> [instance=<instance_name>]]

[host=<hostname>]

status=<event_status> reason=<event_reason> [card=<n>]

timestamp=<event_date> <event_time>

event_type      Description

SERVICE         Primary application service event

SRV_PRECONNECT  Preconnect application service event (TAF)

SERVICEMEMBER   Application service on a specific instance event

DATABASE        Database event

INSTANCE        Instance event

ASM             ASM instance event

NODE            Cluster node event 

#FAN events can control the workload per instance for each service

Oracle Notification Service ONS

- Transmits FAN events

- For every FAN event status change, all executables in $ORA_CRS_HOME/racg/usrco

  are launched (callout scripts)

The ONS process is $ORA_CRS_HOME/opmn/bin/ons

  Arguments:

  -d: Run in daemon mode

  -a <command>: <command> can be [ping, shutdown, reload, or debug]

[$ORA_CRS_HOME/opmn/conf/ons.config]

  localport=6lOO

  remoteport=6200

  loglevel=3

  useocr=on

onsctl start/stop/ping/reconfig/debug/detailed

FCF Fast Connection Failover

- A JDBC application configured to use FCF automatically subscribes to FAN events

- A JDBC application must use service names to connect

- A JDBC application must use implicit connection cache

- $ORACLE_HOME/opmn/lib/ons.jar must be in classpath

- -Doracle.ons.oraclehome - <location of oracle home>

  or

  System.setProperty ("oracle.ons.oraclehome", "/u01/app/oracle/product/10.2.0/db_l");

  OracleDataSource ods = new OracleDataSource();

  ods.setUser("USERl");

  ods.setPassword("USERl");

  ods.setConnectionCachingEnabled(true);

  ods.setFastConnectionFailoverEnabled(true);

  ods.setConnectionCacheName("MyCache");

  ods.setConnectionCacheProperties(cp);

  ods.setURL("jdbc:oracle:thin:@(DESCRIPTION=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCP)

(HOST=londonl-vip)(PORT=152l)(ADDRESS=(PROTOCOL=TCP)(HOST=london2-vip)(PORT=152l)(CON

NECT_DATA=(SERVICE_NAME=SERVICE1)))")

Check for main Clusterware services up

#check Event Manager up

ps -ef | grep  evmd

#check Cluster Synchronization Services up

ps -ef | grep ocssd

#check Cluster Ready Services up

ps -ef | grep crsd

#check Oracle Notification Service

ps -ef | grep ons

[/etc/inittab]

...

hl:35:respawn:/etc/init.d/init.evmd run   >/dev/null 2>&l </dev/null

h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&l </dev/null

h3:35:respawn:/etc/init.d/init.crsd run   >/dev/null 2>&1 </dev/null

crs_stat

#Tested, as root

#Lists the status of an application profile and resources

#crs_stat [resource_name [...]] [-v] [-l] [-q] [-c cluster_node]

$ORA_CRS_HOME/bin/crs_stat -t

  Name           Type           Target    State     Host

  ------------------------------------------------------------

  ora.e2.gsd     application    ONLINE    ONLINE    e2

  ora.e2.ons     application    ONLINE    ONLINE    e2

  ora.e2.vip     application    ONLINE    ONLINE    e2

VIP Normal

  Name           Type           Target    State     Host

  ------------------------------------------------------------

  ora.e2.vip     application    ONLINE    ONLINE    e2

  ora.e2.vip     application    ONLINE    ONLINE    e3
VIP Node 2 is down

  Name           Type           Target    State     Host

  ------------------------------------------------------------

  ora.e2.vip     application    ONLINE    ONLINE    e2

  ora.e2.vip     application    ONLINE    ONLINE    e2

crs_stat -p ...

AUTO_START = #2 CRS will not start after system boot

crs_stat

NAME=ora.RAC.RACl.inst

TYPE=application

TARGET=ONLINE

STATE=ONLINE on londonl 

NAME=ora.RAC.SERVICEl.RACl.srv

TYPE=application

TARGET=OFFLINE

STATE=OFFLINE 

#use -v for verbose resource use

#use -p for a lot of details

#use -ls to view resources and relative owners

Voting disk

On Shared storage, Used by CSS, contains nodes that are currently available within the cluster

If Voting disks are lost and no backup is available then Oracle Clusterware must be reinstalled

3 way multiplexing is ideal

#backup a voting disk online

dd if=<fname> of=<out_fname>

crsctl

#Tested, as oracle

$ORA_CRS_HOME/bin/crsctl check crs

    Cluster Synchronization Services appears healthy

    Cluster Ready Services appears healthy

    Event Manager appears healthy

#add online a new voting disk(10.2), -force if Oracle Clusterware is not started

crsctl add css votedisk 'new votedisk path' -force

crsctl start/stop/enable/disable crs

#set/unset parameters on OCR

crsctl set/unset <parameter> <value>

You can list the currently configured voting disks:

crsctl query css votedisk

0.  0  /u02/oradata/RAC/CSSFilel

1.  1  /u03/oradata/RAC/CSSFile2

2.  2  /u04/oradata/RAC/CSSFile3 

Dynamically add and remove voting disks to an existing Oracle Clusterware installation:

crsctl add/delete css votedisk <path> -force

CRS log and debug

#as root, enable extra debug for the running CRS daemons as well as those running in future

#enable to inspect system reboots

crsctl debug log crs

#Collect log and traces to upload to Oracle Support

diagcollection.pl

OCR - Oracle Cluster Registry

[/etc/oracle/ocr.loc](10g) or [/etc/oracle/srvConfig.loc](9i, still exists in 10g for compatibility)

ocrconfig_loc=/dev/raw/rawl

ocrmirrorconfig_loc=/dev/raw/raw2

local_only=FALSE 

OCRCONFIG - Command-line tool for managing Oracle Cluster Registry

  #recover OCR logically, must be done on all nodes

  ocrconfig -import exp.dmp

  #export OCR content logically

  ocrconfig -export

  #recover OCR from OCR backup

  ocrconfig -restore bck.ocr

  #show backup status

  #crsd daemon backup the OCR every for hours, the most recent backup file is backup00.ocr

  ocrconfig -showbackup

    londonl 2005/08/04 11:15:29 /uOl/app/oracle/product/lO.2.0/crs/cdata/crs

    londonl 2005/08/03 22:24:32 /uOl/app/oracle/product/10.2.0/crs/cdata/crs

  #change OCR autobackuo location

  ocrconfig -backuploc

  #must be run on each affected node

  ocrconfig -repair ocr <filename>

  ocrconfig -repair ocrmirror <filename>

  #force Oracle Clusterware to restart on a node, may lose recent OCR updates

  ocrconfig -overwrite

CVU       - Cluster verification utility to get status of CRS resources

dd                           : use it safely to backup voting disks when nodes are added/removed

#verify restore

cluvfy comp ocr -n all

ocrcheck

#OCR integrity check, validate the accessibility of the device and its block integrity

log to current dir or to $OCR_HOME/log/<node>/client

ocrdump

#dump the OCR content to a text file, if succeds then integrity of backups is verified

OCRDUMP   - Identify the interconnect being used

$ORA CRS HOME/bin/ocrdump.bin -stdout -keyname SYSTEM.css.misscount -xml

Pre install, prerequisite

(./run)cluvfy : run from install media or CRS_HOME, verify prerequisites on all nodes

Post installation

- Backup root.sh

- Set up other user accounts

- Verify Enterprise Manager / Cluster Registry by running srvctl config database -d db_name

SRVCTL

Stores infos in OCR, manages:

Database, Instance, Service, Node applications, ASM, Listener 

srvctl config database -d <db_name> : Verify Enterprise Manager / Cluster Registry

set SRVM_TRACE=TRUE environment var to create Java based tool trace/debug file for srvctl

#-v to check services

srvctl status   database -d RAC -v SERVICE1

srvctl start    database -d <name> [-o mount]

srvctl stop     database -d <name> [-o stop_options]

#moves parameter file

srvctl modify   database -d name -p /u03/oradata/RAC/spfileRAC.ora

srvctl remove   database -d TEST

#Verify the OCR configuration

srvctl config   database - TEST

srvctl start    instance -d RACDB -i "RAC3,RAC4"

srvctl stop     instance -d <orcl> -i "orcl3,orcl4" -o immediate

srvctl add      instance -d RACDB -i RAC3 -n londonS

#move the instance to node london4

srvctl modify   instance -d RAC -i RAC3 -n london4

#set a dependency of instance RAC3 to +ASM3

srvctl modify   instance -d RAC -i RAC3 -s +ASM3

#removes an ASM dependency

srvctl modify   instance -d RAC -i RAC3 -r 

#stop all applications on node

srvctl stop     nodeapps -n londonl

#-a display the VIP configuration

srvctl config   nodeapps -n londonl -a

srvctl add      nodeapps -n london3 -o $0RACLE_H0ME -A london3-vip/255.255.0.0/eth0

Services

Changes are recorded in OCR only! Must use DBMS_SERVICE to update the dictionary

srvctl start    service -d RAC -s "SERVICE1,SERVICE2"

srvctl status   service -d RAC -s "SERVICE1,SERVICE2"

srvctl stop     service -d RAC -s "SERVICE1,SERVICE2" -f

srvctl disable  service -d RAC -s "SERVICE2" -i RAC4

srvctl remove   service -d RAC -s "SERVICE2"

#relocate from RAC2 to RAC4

srvctl relocate service -d RAC -s "SERVICE2" -i RAC2 -t RAC4

#preferred RAC1,RAC2 and available RAC3,RAC4

#-P PRECONNECT automatically creates a ERP and ERP_PRECONNECT service to use as BACKUP in tns_names

#See TNSnames configuration

#the service is NOT started, must be started manually (dbca do it automatically)

srvctl add      service -d ERP -s SERVICE2 -i "RAC1,RAC2" -a "RAC3,RAC4" -P PRECONNECT

#show configuration, -a shows TAF conf

srvctl config   service -d RAC -a

#modify an existing service

srvctl modify   service -d RACDB -s "SERVICE1" -i "RAC1,RAC2" -a "RAC3,RAC4"

srvctl stop     service -d RACDB -s "SERVICE1"

srvctl start    service -d RACDB -s "SERVICE1"

Views

GV$SERVICES

GV$ACTIVE_SERVICES

GV$SERVICEMETRIC

GV$SERVICEMETRIC_HISTORY

GV$SERVICE_WAIT_CLASS

GV$SERVICE_EVENT

GV$SERVICE_STATS

GV$SERV_MOD_ACT_STATS

SQL for RAC

select * from V$ACTIVE_INSTANCES;

Cache Fusion - GRD Global Resource Directory

GES(Global Enqueue Service)

GCS(Global Cache Service)

Data Guard & RAC

- Configuration files at primary location can be stored in any shared ASM diskgroup, on shared raw devices,

  on any shared cluster file system. They simply have to be shared

VIP virtual IP

- Both application/RAC VIP fail over if related application fail and accept new connections

- Recommended RAC VIP sharing among database instances but not among different applications because...

- ...VIP fail over if the application fail over

- A failed over VIP application accepts new connection

- Each VIP requires an unused and resolvable IP address

- VIP address should be registered in DNS

- VIP address should be on the same subnet of the public network

- VIPs are used to prevent connection requests timeout during client connection attempts

Changing a VIP

1- Stop VIP dependent cluster components on one node

2- Make changes on DNS

3- Change VIP using SRVCTL

4- Restart VIP dependent components

5- Repeat above on remaining nodes

oifcfg

allocating and deallocating network interfaces, get values from OCR

To display a list of networks

oifcfg getif

eth1  192.168.1.0  global  cluster_interconnect

eth0  192.168.0.0  global  public

display a list of current subnets

oifcfg iflist

etho  147.43.1.0

ethl  192.168.1.0 

To include a description of the subnet, specify the -p option:

oifcfg iflist -p

ethO  147.43.1.0  UNKNOWN

ethl  192.168.1.0  PRIVATE 

In 10.2 public interfaces are UNKNOWN.

To include the subnet mask, append the -n option to the -p option:

oifcfg if list -p -n

etho  147.43.1.0  UNKNOWN  255.255.255.0

ethl  192.168.1.0  PRIVATE  255.255.255.0

Db parameters with SAME VALUE across all instances

active_instance_count

archive_lag_target

compatible

cluster_database             RAC param

cluster_database_instance    RAC param

#Define network interfaces that will be used for interconnect

#it is not a failover but a redistribution. If an address not work then stop all

#Overrides the OCR

cluster_interconnects        RAC param = 192.168.0.10; 192.168.0.11; ...

control_files

db_block_size

db_domain

db_files

db_name

db_recovery_file_dest

db_recovery_file_dest_size

db_unique_name

dml_locks                    (when 0)

instance_type                (rdbms or asm)

max_commit_propagation_delay RAC param

parallel_max_servers

remote_login_password_file

trace_enabled

#cannot be mixed AUTO and MANUAL in a RAC

undo_management

Db parameters with INSTANCE specific VALUE across all instances

instance_name

instance_number

thread

undo_tablespace #system param

Listener parameters

local_listener='(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.13) (PORT = 1521)))'

  #allow pmon to register with local listener when not using 1521 port

remote_listener = '(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9) (PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST =192.168.2.10)(PORT = 1521)))'

  #make the listener aware of the load of the listeners of other nodes

Important Rac Parameters

gc_files_to_locks #other than default disable Cache Fusion

recovery_parallelism #number of redo application server processes in instance or media recovery

Rac and Standby parameters

dg_broker_config_file1 #shared between primary and standby instances

dg_broker_config_file2 #different from dg_broker_config_file1, shared between primary and standby instances

Forum section

RAC 10g administration

Shared contents

datafiles, controlfiles, spfiles, redo log

Shared or local?

                                                 RAW_Dev File_Syst  ASM NFS OCFS

- Datafiles              : shared mandatory

- Control files          : shared mandatory

- Redo log               : shared mandatory

- SPfile                 : shared mandatory

- OCR and vote           : shared mandatory           Y         Y     N

- Archived log           : shared not mandatory.      N         Y     N  Y

- Undo                   : local

- Flash Recovery         : shared                                     Y  Y     Y

- Data Guard broker conf.: shared(prim. & stdby)      Y               Y

Adding logfile thread groups for a new instance

#To support a new instance on your RAC

1) alter database add logfile thread 3 group 7;

1) alter database add logfile thread 3 group 8;

#makes the thread available for use by any instance

2) alter database enable thread 3;

# if you want to change an used thread

  2) alter system set thread=3 scope=pfile sid='RAC01'

  3) srvctl stop instance -d RACDB -i RAC01

Views and queries

select * from GV$CACHE_TRANSFER

An instance failed to start, what do we do?

1) Check the instance alert.log

2) Check the Oracle Clusterware software alert.log

3) Check the resource state using CRS_STAT

Install

See official Note 239998.1 for removing crs installation

See http://startoracle.com/2007/09/30/so-you-want-to-play-with-oracle-11gs-rac-heres-how/ to install 11g RAC on VMware

See http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html to install on Linux with iSCSI disks

See http://www.oracle-base.com/articles/10g/OracleDB10gR2RACInstallationOnCentos4UsingVMware.php to install on VMware

#If using VMware remember to allow shared disks disabling locking

#Obviously this is not required if you use OCFS or ASM

disk.locking = "FALSE"

See OCFS Oracle Cluster Filesystem

Prerequisites check

#check node connectivity and Clusterware integrity

./runcluvfy.sh stage -pre dbinst -n all

./runcluvfy.sh stage -post hwos  -n "linuxes,linuxes1"  -verbose
WARNING:

Package cvuqdisk not installed.   

rpm -Uvh clusterware/rpm/cvuqdisk-1.0.1-1.rpm



WARNING:

Unable to determine the sharedness of /dev/sdf on nodes:

        linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes,linuxes,linuxes,linuxes,linuxes,linuxes

Safely ignore this error

./runcluvfy.sh comp peer -n "linuxes,linuxes1"  -verbose

./runcluvfy.sh comp nodecon -n "linuxes,linuxes1" -verbose

./runcluvfy.sh comp sys -n "linuxes,linuxes1" -p crs -verbose

./runcluvfy.sh comp admprv -n "linuxes,linuxes1" -verbose -o user_equiv

./runcluvfy.sh stage -pre crsinst -n "linuxes,linuxes1" -r 10gR2

Restart intallation - Remove from each node

su -c "$ORA_CRS_HOME/install/rootdelete.sh; $ORA_CRS_HOME/install/rootdeinstall.sh"

#oracle user

export DISPLAY=192.168.0.1:0.0

/app/crs/oui/bin/runInstaller -removeHome -noClusterEnabled ORACLE_HOME=/app/crs LOCAL_NODE=linuxes

rm -rf $ORA_CRS_HOME/*

#root

su -c "chown oracle:dba /dev/raw/*; chmod 660 /dev/raw/*; rm -rf /var/tmp/.oracle; rm -rf /tmp/.oracle"

#Format rawdevices using

dd if=/dev/zero of=/dev/raw/raw6 bs=1M count=250

#If related error message appears during installation, manually launch on related node

/app/crs/oui/bin/runInstaller -attachHome -noClusterEnabled ORACLE_HOME=/app/crs ORACLE_HOME_NAME=OraCrsHome CLUSTER_NODES=linuxes,linuxes1 CRS=true  "INVENTORY_LOCATION=/app/oracle/oraInventory" LOCAL_NODE=linuxes

runcluvfy.sh stage -pre crsinst -n linuxes -verbose

Forum section

RAC 10g administration

/etc/hosts example

# Do not remove the following line, or various programs

# that require network functionality will fail,

127.0.0.1     localhost

147.43.1.101  londonl

147.43.1.102  london2

#VIP is usable only after VIPCA utility run,

#should be created on the public interface. Remember that VIPCA is a GUI tool

147.43.1.201  londonl-vip

147.43.1.202  london2-vip

192.168.1.1   londonl-priv

192.168.1.2   london2-priv

Kernel Parameters(/etc/sysctl.conf) Recommended Values

kernel.sem (semmsl)           250

kernel.sem (semmns)           32000

kernel.sem (semopm)           100

kernel.sem (semmni)           128

kernel.shmall                 2097152

kernel.shmmax                 Half the size of physical memory

kernel.shmmni                 4096

fs.file-max                   65536

net.core.rmem_default         262144

net.core.rmem_max             262144

net.core.wmem_default         262144

net.core.wmem_max             262144

net.ipv4.ip_local_port_range  1024 to 65000

RAC restrictions

- dbms_alert, both publisher and subscriber must be on same instance, AQ is the workaround

- dbms_pipe, only works on the same instance, AQ is the workaround

- UTL_FILE, directories, external tables and BFILEs need to be on shared storage

Implementing the HA High Availability Framework

Use srvctl to start/stop applications

#Manually create a script that OCR will use to start/stop/status

#Create an application VIP.

#This command generates an application profile called haf demovip.cap in the $ORA_CRS_HOME/crs/ public directory.

$ORA_CRS_HOME/bin/crs_profile -create hafdemovip -t application -a $ORA_CRS_HOME/bin/usrvip

-o oi=eth0,ov=147.43.1.200,on=255.255.0.0 

#As the oracle user, register the VIP with Oracle Clusterware:

ORA_CRS_HOME/bin/crs_register hafdemovip 

#As the root user, set the owner of the apphcation VIP to root:

$ORA_CRS_HOME/bin/crs_setperm hafdemovip -o root 

#As the root user, grant the oracle user permission to run the script:

$ORA_CRS_HOME/bin/crs_setperm hafdemovip -u user:oracle:r-x 

#As the oracle user, start the application VIP:

$ORA_CRS_HOME/bin/crs_start hafdemovip 

2. Create an application profile.

$ORA_CRS_HOHE/bin/crs_profile -create hafdemo -t application -d "HAF Demo" -r hafdemovip

-a /tmp/HAFDemoAction -0 ci=5,ra=60

3. Register the application profile with Oracle Clusterware.

$ORA_CRS_HOHE/bin/crs_register hafdemo 

$ORA_CRS_HOME/bin/crs_start hafdemo

CRS commands

crs_profile

crs_register

crs_unregister

crs_getperm

crs_setperm

crs_start

crs_stop

crs_stat

crs_relocate

Server side callouts

Oracle instance up(/down?)

Service member down(/up?)

Shadow application service up(/down?)

Adding a new node

- Configure hardware and OS

- With NETCA reconfigure listeners and add the new one

- $ORA_CRS_HOME/oui/bin/addnode.sh from one of existing nodes to define the new one to all existing nodes

- $ASM_HOME/oui/bin/addnode.sh from one of existing nodes (if using ASM)

- $ORACLE_HOME/oui/bin/addnode.sh from one of existing nodes

- racgons -add_config to add ONS metadata to OCR from one of existing nodes

Removing a node from a cluster

- Remove node from clusterware

- Check that ONS configuration has been updated on other node

- Check that database and instances are terminated on node to remove

- Check that node has been removed from database and ASM repository

- Check that software has been removed from database and ASM homes on node to remove

RAC contentions

- enq:HW - contention and gc current grant wait events

  Use larger uniform extent size for objects

- enq: TX - index contention

  Re-create the index as a global hash partitioned index.

  Increase the sequence cache size if retaining the sequence.

  Re-create the table using a natural key instead of a surrogate key.

秒客网

RAC 10g administration

10g RAC administration

Forum section

Install

Forum section

相关文章