Tuesday, October 29, 2013

How to properly shutdown oracle rac database and its clusterware components in 11gR2

The first step to shutdown any database whether it is RAC or single instance is to stop all the applications that are accessing the Database. DBA should also stop Oracle application like Enterprise Manager Grid Control and Database Listener which access database for monitoring and database connections.

1.Stop Enterprise Manager/Grid Control
[oracle@rac1 ~]$ emctl status dbconsole
Environment variable ORACLE_UNQNAME not defined. Please set ORACLE_UNQNAME to database unique name.
[oracle@rac1 ~]$
[oracle@rac1 ~]$ export ORACLE_UNQNAME=myrac
[oracle@rac1 ~]$  emctl status dbconsole
Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0
Copyright (c) 1996, 2009 Oracle Corporation.  All rights reserved.
https://rac1.mydomain:1158/em/console/aboutApplication
Oracle Enterprise Manager 11g is running.
------------------------------------------------------------------
Logs are generated in directory /app/dbHome/rac1_myrac/sysman/log
[oracle@rac1 ~]$ emctl stop dbconsole
Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0
Copyright (c) 1996, 2009 Oracle Corporation.  All rights reserved.
https://rac1.mydomain:1158/em/console/aboutApplication
Stopping Oracle Enterprise Manager 11g Database Control ... ...  Stopped.
[oracle@rac1 ~]$ emctl status dbconsole
Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0
Copyright (c) 1996, 2009 Oracle Corporation.  All rights reserved.
https://rac1.mydomain:1158/em/console/aboutApplication
Oracle Enterprise Manager 11g is not running.
[oracle@rac1 ~]$

2.Stop Node listener 
[root@rac1 ~]# srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): rac1,rac2
For checking the status on specific nodes use the following command,
[root@rac1 ~]# srvctl status listener -n rac1  
Listener LISTENER is enabled on node(s): rac1
Listener LISTENER is running on node(s): rac1
[root@rac1 ~]# ps -ef | grep tns
root        13     2  0 Oct28 ?        00:00:00 [netns]
grid     14547     1  0 09:32 ?        00:00:00 /app/gridHome/bin/tnslsnr LISTENER -inherit
oracle   14640 14218  0 09:34 pts/2    00:00:00 grep tns
grid     32208     1  0 Oct28 ?        00:00:01 /app/gridHome/bin/tnslsnr LISTENER_SCAN1 -inherit
[root@rac1 ~]# srvctl stop listener  --> this command will stop the node listener on entire 
node, if you want to do it on specific node use following command,
srvctl stop listener -n node1
[root@rac1 ~]# srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is not running
[root@rac1 ~]# ps -ef | grep tns
root        13     2  0 Oct28 ?        00:00:00 [netns]
oracle   14704 14218  0 09:34 pts/2    00:00:00 grep tns
grid     32208     1  0 Oct28 ?        00:00:01 /app/gridHome/bin/tnslsnr LISTENER_SCAN1 -inherit
[root@rac1 ~]#
Note:- Now you can see  only SCAN listener is running  on this node

3.Stopping database on all node
[root@rac1 ~]#  srvctl status database -d MYRAC  
Instance myrac1 is running on node rac1
Instance myrac2 is running on node rac2
[root@rac1 ~]# srvctl stop database -d MYRAC
The above command will shutdown all the instance within the cluster . To shutdown specific instance use the following syntax
srvctl stop database -d db_name –i instance_name
[root@rac1 ~]# srvctl status database -d MYRAC
Instance myrac1 is not running on node rac1
Instance myrac2 is not running on node rac2
[root@rac1 ~]# ps -ef  | grep pmon
oracle   15161 14218  0 09:53 pts/2    00:00:00 grep pmon
grid     31716     1  0 Oct28 ?        00:00:07 asm_pmon_+ASM1
[root@rac1 ~]#

4.Stopping ASM instances and clusterware components
Now we have to shutdown asm instances and its clusterware components . As of 11gR2, we don't have to shutdown asm instances and its node apps separately . Upto 11gR1 we have to shutdown these process separately using following commands ,
Use below command to shutdown ASM instances on all nodes
srvctl stop asm -n node
Use below command to shutdown node apps on all RAC nodes
srvctl stop nodeapps -n node
Use below command to shutdown Oracle clusterware on all RAC nodes
crsctl stop crs
Please note that using above command will stop Oracle High availability services (OHAS) and Clustware stack in a single command

From 11g R2, you can do this in two steps
First stop the clusterware stack on all nodes in the cluster 
[root@rac1 ~]# crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN3.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.OCR.dg' on 'rac2'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac2'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'rac2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.OCR.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac1'
CRS-2677: Stop of 'ora.scan1.vip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.rac2.vip' on 'rac2'
CRS-2677: Stop of 'ora.rac2.vip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN3.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.scan3.vip' on 'rac2'
CRS-2677: Stop of 'ora.scan3.vip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.OCR.dg' on 'rac1' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'rac1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.eons' on 'rac1'
CRS-2673: Attempting to stop 'ora.ons' on 'rac1'
CRS-2677: Stop of 'ora.ons' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac1'
CRS-2677: Stop of 'ora.net1.network' on 'rac1' succeeded
CRS-2677: Stop of 'ora.eons' on 'rac1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac1' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.OCR.dg' on 'rac2' succeeded
CRS-2677: Stop of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'rac2' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac2'
CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.eons' on 'rac2'
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac2'
CRS-2677: Stop of 'ora.net1.network' on 'rac2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rac1'
CRS-2677: Stop of 'ora.diskmon' on 'rac1' succeeded
CRS-2675: Stop of 'ora.eons' on 'rac2' failed
CRS-2679: Attempting to clean 'ora.eons' on 'rac2'
CRS-2681: Clean of 'ora.eons' on 'rac2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac2' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac2'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac2'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'
CRS-2673: Attempting to stop 'ora.asm' on 'rac2'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rac2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'
CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rac2'
CRS-2677: Stop of 'ora.diskmon' on 'rac2' succeeded
[root@rac1 ~]#
 Note:- If you want to stop Clustwerware stack on local node use the following command

#crsctl stop cluster

[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@rac1 ~]#
Here you can see only Oracle High Availability Services is in online , Stop Oracle High availability service demon on each node in the cluster.
[root@rac1 ~]# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@rac1 ~]#
You have to execute the same command 'crsctl stop has'  on all nodes in the cluster , Here i executed the same command on my node2
[root@rac2 ~]#  crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'
CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@rac2 ~]#
[root@rac1 ~]# crsctl check cluster -all
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Check failed, or completed with errors.
[root@rac1 ~]#
Finally we cleanly shut-downed our two node oracle rac  . Hope these steps helps somebody :) 


Saturday, October 26, 2013

Managing Backups and Recovery of the OCR Using OCR Backup Files

Two methods are most popular for copying OCR content and using it for recovery. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.

Backing up OCR
For taking OCR backup we can use one of the following methods 
1.Backup from automatically generated OCR copies 
The Oracle Clusterware automatically creates OCR backups every four hours. At any one time, Oracle always retains the last three backup copies of the OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and at the end of each week.

Note:- You cannot customize the backup frequencies or the number of files that oracle retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR resides.

The default location for generating backups on UNIX-based systems is CRS_home/cdata/cluster_name where cluster_name is the name of your cluster. The Windows-based default location for generating backups uses the same path structure.
ocrconfig utility can be used to view the backups generated automatically by Oracle Clusterware.To find the most recent backup of the OCR, On any cluster node run following command
[root@rac2 ~]# ocrconfig -showbackup

rac2     2013/10/19 14:26:26     /app/gridHome/cdata/rac-cluster/backup00.ocr

rac2     2013/10/19 10:26:16     /app/gridHome/cdata/rac-cluster/backup01.ocr

rac2     2013/10/19 06:26:14     /app/gridHome/cdata/rac-cluster/backup02.ocr

rac2     2013/10/18 02:26:02     /app/gridHome/cdata/rac-cluster/day.ocr

rac2     2013/10/17 22:26:00     /app/gridHome/cdata/rac-cluster/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available

If you are not getting any output , check the same command on other nodes 
To make a backup, use the ocrconfig tool with the showbackup option to identify the backup files, and then copy each file to a location that is redundant to the location of the original OCR backup file. This ensures that there are at least two copies of each OCR backup file.
Note:- You must be logged in as root user to run ocrconfig commands.if ocrconfig command is not working in root user, update the .bash_profile with specific bin path.

2.Manually creating OCR export files 
In addition to using the automatically created OCR backup files, you should also export the OCR contents to a file before and after making significant configuration changes, such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Exporting the OCR contents to a file lets you import the OCR if your configuration changes cause errors. For example, if you have unresolvable configuration problems, or if you are unable to restart your cluster database after such changes, then you can restore your configuration by importing the saved OCR content from the valid configuration.
To perform manual backup of OCR:
You can  either use ocrconfig -manualbackup or ocrconfig -export command to force Oracle Clusterware to perform a backup of the OCR at any time, rather than wait for the automatic backup that occurs at 4-hour intervals.
The -manualbackup option is especially useful when you to need to obtain a binary backup on demand, such as before you make changes to the OCR. You need to be logged in as root user to take manual backups
To view the available OCR Manual backup
[root@rac1 ~]#ocrconfig -showbackup manual  
To perform manual backup
[root@rac1 ~]# ocrconfig -manualbackup
[root@rac1 ~]# ocrconfig -showbackup manual
rac1     2013/10/21 16:26:48     /app/gridHome/cdata/rac-cluster/backup_20131021_162648.ocr
You can also use export command along  with ocrconfig  to take the backup of  OCR  
[root@rac1 ~]# ocrconfig -export /backup/ocrbkp.dmp
[root@rac1 ~]# ll /backup/
-rw-r--r-- 1 root root 95055 Oct 26 17:52 ocrbkp.dmp

Recovering the OCR
If an application fails, try to restart the application and if you think the error is related to OCR, run an ocrcheck. It would normally return failure message if there is a issue or corruption in OCR. Then before attempting to restore the OCR, as a definitive verification that the OCR failed on both the primary OCR and the OCR mirror, Attempt to correct the problem using one of the following platform-specific OCR diagnostic procedures. If diagnosis doesn’t help you may have to restore OCR from backup
There are two methods for recovering the OCR. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.

1.Restoring the OCR from Automatically Generated OCR Backups
Before you attempt to restore the OCR, ensure that the OCR is unavailable. To check the status of the OCR, Run the following command:
[root@rac1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2696
         Available space (kbytes) :     259424
         ID                       : 1005378644
         Device/File Name         :       +OCR
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded
         Logical corruption check succeeded
Note:-  Device/File not configured means ocrmirror is not configured
[root@rac1 ~]#
If both the primary OCR and the OCR mirror have failed. You must restore the OCR from a backup.If there is at least one copy of the OCR available, you can use that copy to restore the other copies of the OCR.When restoring the OCR from automatically generated backups, you first have to determine which backup file you will use for the recovery.
Log in as the root user and identify the available OCR backups using the ocrconfig command:
[root@rac1 ~]#ocrconfig -showbackup
Review the contents of the backup using the following ocrdump command, where file_name is the name of the OCR backup file:
[root@rac1 ~]# ocrdump -backupfile file_name
For recovering OCR from automated or manual backup needs all cluster, RAC instances and RAC database bring down before performing recovery of OCR.
[root@rac1 ~]# srvctl stop database -d MYRAC -o immediate
[root@rac1 ~]# srvctl status database -d MYRAC
Instance myrac1 is not running on node rac1
Instance myrac2 is not running on node rac2
[root@rac1 ~]# crsctl stop crs
Repeat this command on each node in your Oracle RAC cluster.
As root user , restore the OCR file using following command
[root@rac1 ~]#ocrconfig -restore file_name
As the root user, restart Oracle Clusterware on all the nodes in your cluster by restarting each node
[root@rac1 ~]# crsctl start crs
[root@rac1 ~]#srvctl start database -d MYRAC
Repeat this command on each node in your Oracle RAC cluster.
Use the Cluster Verification Utility (CVU) to verify the OCR integrity using following command. for that you have to login as grid user
[grid@rac1 ~]$  cluvfy comp ocr -n all 

2.Recovering the OCR from an OCR Export File
To restore the previous configuration stored in the OCR from an OCR export file:
Place the OCR export file that you created previously using the ocrconfig -export command in an accessible directory on disk.Stop Oracle Clusterware and RAC database on all the nodes in your Oracle RAC cluster as root user using following command.
[root@rac1 ~]# srvctl stop database -d MYRAC -o immediate
[root@rac1 ~]# srvctl status database -d MYRAC
Instance myrac1 is not running on node rac1
Instance myrac2 is not running on node rac2
[root@rac1 ~]# crsctl stop crs
Repeat this command on each node in your Oracle RAC cluster.
As root user restore the OCR data by data by importing the contents of the OCR export file
[root@rac1 ~]# ocrconfig -import file_name
Restart Oracle Clusterware on all the nodes in your cluster by restarting each node as root user using following command
[root@rac1 ~]# crsctl start crs
[root@rac1 ~]#srvctl start database -d MYRAC
Repeat this command on each node in your Oracle RAC cluster.
To verify the OCR integrity using Cluster Verification utility using following command.
[grid@rac1 ~]$ cluvfy comp ocr -n all 
 Want to know more about Real Application Cluster Click Oracle Real Application Cluster

Things to remember,
  • Oracle takes physical backup of OCR automatically.
  • No Cluster downtime or RAC database down time requires for PHYSICAL backup of OCR.
  • No Cluster downtime or RAC database down time requires for MANUAL export backup of OCR.
  • For recovery of OCR from any of above backup it should need to down ALL.
  • All procedure requires ROOT login.
  • As of 11gR2  when you backup OCR, it takes backup of voting disk automatically as per Oracle documentation.
  • ocrconfig --manualbackup is introduced in 11g 
  • The -showbackup option now includes auto and manual flags that you can optionally specify to view the auto/manual existing backup information.