Today
I have gone through a very interesting video that demonstrate different
clusterware failure scenarios with practical example. Here is the link https://www.youtube.com/watch?v=20qgRJEFC7w
Here is the abstract of this video
1. Node failure :-
CRS Resource will go to offline mode.
Instance recovery fill be performed by surviving node.
Node VIP and SCAN VIP will fail over to surviving node.
SCAN Listener will fail over to surviving node.
Client connection will move to surviving node If you used
TAF at client side.
Resource like node listner ,ASM , database etc will be in
offline mode for failed node.
When you bring up the failed node all resource that were
failed over to the surviving node will relocate back to the original node.
CRS Resources that were in offline mode might get started
automatically based on the AUTO_START setting.
Possible values of AUTO_START are,
always—
Causes the resource to restart when the node restarts regardless of the
resource's state when the node stopped.
restore—
Does not start the resource at restart time if it was in an offline state, such
as STATE=OFFLINE, TARGET=OFFLINE, when the node stopped. The resource is
restored to its state when the node went down. The resource is started only if
it was online before and not otherwise.
never—
Oracle Clusterware never restarts the resource regardless of the resource's
state when the node stopped.
2. Instance
failure:-
Instance recovery will be performed by surviving
instance.
Surviving instance reads the online redo log files of the
failed instance and ensure that committed transaction are recorded to the
database.
If all nodes fail, one instance will performe the recovery
of all instances.
Services will be moved to available instance.
Clients connection will be moved to surviving instance if
TAF is in place.
Failed instance may be restarted by the clusterware
automatically.
You could see something like following in alertlog file
from the survining node.
3. ASM instance
failure:-
ASM instance will be OFFLINE and will be automatically
restarted by the clusterware.
Instance recovery will be performed.
Client connection will be moved to surviving instance if
TAF is in place.
Services will be moved to available instance.
4. Local and SCAN
Listener failure:-
Listener
failure will be detected by CRSD and restarted.
5. Public network
failure:-
Node VIP and SCAN VIP will fail over to surviving node.
DB instance will be up and DB service will fail to
surviving node.
6. Private network
failure.
Up-to 11g R2 CSSD will detects a split-brain situation and
will survive the node with the lowest node number. For example In case of two
node cluster,Second node will be evicted.
However, starting from Oracle Database 12.1.0.2c, the node with higher weight will survive during split
brain resolution. If you want to know about this, visit http://allthingsoracle.com/split-brain-whats-new-in-oracle-database-12-1-0-2c/
Clusterware will try to perform a reboot-less node
fencing by cleaning-up CRSD resources ,if not possible a node reboot will
happen.
OHAS will be up and running on the affected node , but
CRSD,CSSD,EVMD,HAIP etc will be in OFFLINE mode.
As soon as private network is started, CSSD, CRSD and
EVMD services start immediately on second node and joins the cluster.
Consider you have a two node 11gr2 cluster and you
removed private interconnect cable from the first node. Whic node will get
evicted i?
obviously the second node as oracle will survive the node
with loweset node number in 11gr2.
Ie, in 11g it doesnt matter private network of first or
second node fails it always evict the second node.
7. OCR and Voting
disk failure:-
OCR
As best practice make sure you set two different disk
groups to store OCR- one for main storage and second for OCR mirror.
Lets suppose you have two disk group (DATA & OCR) for
storing OCR and both have two failgroup.
If you lose entire OCR disk grop and one failgroup within
the DATA disk group, Clusterware will works as normal,but what happen if you
lose the surviving failgroup in the main disk group too?
You have two
option
- Restore
OCR from backup.
- Restore OCR
without backup.
How to create a OCR mirror ?
1. create the
disk group with sufficinet failgroup ,Set diskgroup compatible etc from first
node..
2. mount the
disk from the second node.
3. make a
mirror copy of OCR in newly created disk group by - ocrconfig -add '+NEWOCR'
(as root)
4. verify the
OCR mirror copy by - ocrcheck
Voting
Disk
EXTERNAL REDUNDANCY - CSSD choose one ASM DISK and create one (1) voting disk.
NORMAL REDUNCANCY - CSSD choose 3 ASM DISK from different
FAILGROUP and create 3 VOTING DISK, one in each ASM DISK/FAILGROUP
HIGH REDUNDANCY - CSSD choose 5 ASM DISK from different
FAILGROUP and create 5 VOTING DISK, one in each ASM DISK/FAILGROUP.
In order to be able to tolerate a failure of n voting
disk files , one must have at least 2n+1 configured.
Move/put voting
disk in a new disk group
1. create the disk
group with sufficinet failgroup ,Set diskgroup compatible etc from first node..
2. mount the disk from the second node.
3. Put the voting diks in newly created diskgroup - crsctl replace votedisk +NEWDG
4. Verify the new locaton of voting disk by - crsctl
query css votedisk
You drop an ASM Disk that contains a Voting Disk, what
will happen ? Oracle will silently move the Voting Disk to another ASM Disk of
that Diskgroup (if exists),otherwise the command will ignored.
Reference:- https://www.youtube.com/watch?v=20qgRJEFC7w