Showing posts with label RAC & ASM. Show all posts
Showing posts with label RAC & ASM. Show all posts

Saturday, September 17, 2016

Clusterware failure scenarios a study note


Today I have gone through a very interesting video that demonstrate different clusterware failure scenarios with practical example. Here is the link https://www.youtube.com/watch?v=20qgRJEFC7w

Here is the abstract of this video 

1.  Node failure :-

CRS Resource will go to offline mode.

Instance recovery fill be performed by surviving node.

Node VIP and SCAN VIP will fail over to surviving node.

SCAN Listener will fail over to surviving node.

Client connection will move to surviving node If you used TAF at   client side.

Resource like node listner ,ASM , database etc will be in offline mode for failed node.
    
When you bring up the failed node all resource that were failed over to the surviving node will relocate back to the original node.

CRS Resources that were in offline mode might get started automatically based on the AUTO_START setting.

Possible values of AUTO_START are,
      
always— Causes the resource to restart when the node restarts regardless of the resource's state when the node stopped.

restore— Does not start the resource at restart time if it was in an offline state, such as STATE=OFFLINE, TARGET=OFFLINE, when the node stopped. The resource is restored to its state when the node went down. The resource is started only if it was online before and not otherwise.

never— Oracle Clusterware never restarts the resource regardless of the resource's state when the node stopped.
      
2. Instance failure:-

Instance recovery will be performed by surviving instance.
Surviving instance reads the online redo log files of the failed instance and ensure that committed transaction are recorded to the database.

If all nodes fail, one instance will performe the recovery of all instances.

Services will be moved to available instance.

Clients connection will be moved to surviving instance if TAF is in place.

Failed instance may be restarted by the clusterware automatically.
    
You could see something like following in alertlog file from the survining node.

     
3. ASM instance failure:-

ASM instance will be OFFLINE and will be automatically restarted by the clusterware.

Instance recovery will be performed.

Client connection will be moved to surviving instance if TAF is in place.

Services will be moved to available instance.
    
4. Local and SCAN Listener failure:-

Listener failure will be detected by CRSD and restarted.
    
5. Public network failure:-

Node VIP and SCAN VIP will fail over to surviving node.
DB instance will be up and DB service will fail to surviving node.
    
6. Private network failure.

Up-to 11g R2 CSSD will detects a split-brain situation and will survive the node with the lowest node number. For example In case of two node cluster,Second node will be evicted.

However, starting from Oracle Database 12.1.0.2c, the node with higher weight will survive during split brain resolution. If you want to know about this, visit http://allthingsoracle.com/split-brain-whats-new-in-oracle-database-12-1-0-2c/

Clusterware will try to perform a reboot-less node fencing by cleaning-up CRSD resources ,if not possible a node reboot will happen.
    
OHAS will be up and running on the affected node , but CRSD,CSSD,EVMD,HAIP etc will be in OFFLINE mode.

As soon as private network is started, CSSD, CRSD and EVMD services start immediately on second node and joins the cluster.
    
Consider you have a two node 11gr2 cluster and you removed private interconnect cable from the first node. Whic node will get evicted i?
obviously the second node as oracle will survive the node with loweset node number in 11gr2.
    
Ie, in 11g it doesnt matter private network of first or second node fails it always evict the second node.

7. OCR and Voting disk failure:-

OCR

As best practice make sure you set two different disk groups to store OCR- one for main storage and second for OCR mirror.
  
Lets suppose you have two disk group (DATA & OCR) for storing OCR and both have two failgroup.
  
 
If you lose entire OCR disk grop and one failgroup within the DATA disk group, Clusterware will works as normal,but what happen if you lose the surviving failgroup in the main disk group too?

  You have two option
  - Restore OCR  from backup.
  - Restore OCR without backup.
 
How to create a OCR mirror ?

     1. create the disk group with sufficinet failgroup ,Set diskgroup compatible etc from first node..
     2. mount the disk from the second node.
     3. make a mirror copy of OCR in newly created disk group by - ocrconfig -add '+NEWOCR' (as root)
     4. verify the OCR mirror copy by - ocrcheck
     
 
Voting Disk

EXTERNAL REDUNDANCY - CSSD choose one ASM DISK and  create one (1) voting disk.

NORMAL REDUNCANCY - CSSD choose 3 ASM DISK from different FAILGROUP and create 3 VOTING DISK, one in each ASM DISK/FAILGROUP

HIGH REDUNDANCY - CSSD choose 5 ASM DISK from different FAILGROUP and create 5 VOTING DISK, one in each ASM DISK/FAILGROUP.
 
In order to be able to tolerate a failure of n voting disk files , one must have at least 2n+1 configured. 
 
 Move/put voting disk in a new disk group

 1. create the disk group with sufficinet failgroup ,Set diskgroup compatible etc from first node..
2. mount the disk from the second node.
3. Put the voting diks in newly created diskgroup - crsctl replace votedisk +NEWDG
4. Verify the new locaton of voting disk by - crsctl query css votedisk
      
You drop an ASM Disk that contains a Voting Disk, what will happen ? Oracle will silently move the Voting Disk to another ASM Disk of that Diskgroup (if exists),otherwise the command will ignored.
      
Reference:- https://www.youtube.com/watch?v=20qgRJEFC7w