Nothing but my learning and experience with Oracle..!: September 2016

Tuesday, September 20, 2016

Recovering from the loss of a system tablespace to a differnet location in OMF

When you lose the SYSTEM tablespace,the instance could crash or hang: anyway the instance will alert you as soon as possible.

In this example we will consider the scenario where we need to restore the lost tablespace to a differnet location as the disk cotroller where we created our SYSTEM tablespace is not working.

Unlike recovery of non–system tablespaces that can be recovered with the database in the OPEN state, the database must be in the MOUNT state to recover either the SYSTEM or UNDO tablespace.

First take a full database backup and simulate this issue.

[oracle@mydb01]:[TTREC1] $ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 21 15:39:29 2016

connected to target database: TTREC (DBID=4201939161)

RMAN> backup incremental level 0 database ;

Starting backup at 21-SEP-16

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=365 device type=DISK

allocated channel: ORA_DISK_2

channel ORA_DISK_2: SID=728 device type=DISK

allocated channel: ORA_DISK_3

channel ORA_DISK_3: SID=564 device type=DISK

allocated channel: ORA_DISK_4

channel ORA_DISK_4: SID=135 device type=DISK

channel ORA_DISK_1: starting incremental level 0 datafile backup set

channel ORA_DISK_1: specifying datafile(s) in backup set

input datafile file number=00004 name=/data1/TTREC/datafile/o1_mf_users_cvf5oss4_.dbf

channel ORA_DISK_1: starting piece 1 at 21-SEP-16

channel ORA_DISK_2: starting incremental level 0 datafile backup set

channel ORA_DISK_2: specifying datafile(s) in backup set

input datafile file number=00003 name=/data1/TTREC/datafile/o1_mf_undotbs1_cvf5npon_.dbf

input datafile file number=00008 name=/data1/TTREC/datafile/o1_mf_orion_da_cvf5npp4_.dbf

input datafile file number=00007 name=/data1/TTREC/datafile/o1_mf_tbs_test_cvf5ostw_.dbf

channel ORA_DISK_2: starting piece 1 at 21-SEP-16

channel ORA_DISK_3: starting incremental level 0 datafile backup set

channel ORA_DISK_3: specifying datafile(s) in backup set

input datafile file number=00002 name=/data1/TTREC/datafile/o1_mf_sysaux_cvf5jnh9_.dbf

input datafile file number=00006 name=/data1/TTREC/datafile/o1_mf_oraion_t_cvf5jnjb_.dbf

input datafile file number=00010 name=/data1/TTREC/datafile/o1_mf_testb_cxnhzbbf_.dbf

channel ORA_DISK_3: starting piece 1 at 21-SEP-16

channel ORA_DISK_4: starting incremental level 0 datafile backup set

channel ORA_DISK_4: specifying datafile(s) in backup set

input datafile file number=00001 name=/data1/TTREC/datafile/o1_mf_system_cy46np73_.dbf

input datafile file number=00005 name=/data1/TTREC/datafile/o1_mf_orinon_t_cvf5lomr_.dbf

input datafile file number=00009 name=/data1/TTREC/datafile/o1_mf_testa_cxf1h6kw_.dbf

channel ORA_DISK_4: starting piece 1 at 21-SEP-16

channel ORA_DISK_1: finished piece 1 at 21-SEP-16

piece handle=/t4_nfs/TTREC/database_07rgcdi4_1_1 tag=TAG20160921T153948 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:08

channel ORA_DISK_2: finished piece 1 at 21-SEP-16

piece handle=/t4_nfs/TTREC/database_08rgcdi4_1_1 tag=TAG20160921T153948 comment=NONE

channel ORA_DISK_2: backup set complete, elapsed time: 00:00:26

channel ORA_DISK_4: finished piece 1 at 21-SEP-16

piece handle=/t4_nfs/TTREC/database_0argcdi5_1_1 tag=TAG20160921T153948 comment=NONE

channel ORA_DISK_4: backup set complete, elapsed time: 00:01:15

channel ORA_DISK_3: finished piece 1 at 21-SEP-16

piece handle=/t4_nfs/TTREC/database_09rgcdi4_1_1 tag=TAG20160921T153948 comment=NONE

channel ORA_DISK_3: backup set complete, elapsed time: 00:04:15

Finished backup at 21-SEP-16

Starting Control File and SPFILE Autobackup at 21-SEP-16

piece handle=/data2/TTREC/autobackup/2016_09_21/o1_mf_s_923154244_cy47k48c_.bkp comment=NONE

Finished Control File and SPFILE Autobackup at 21-SEP-16

RMAN> exit

Recovery Manager complete.

[oracle@mydb01]:[TTREC1] $ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Sep 21 15:44:54 2016

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options

SQL> select file_name from dba_data_files where tablespace_name='SYSTEM';

FILE_NAME

-----------------------------------------------------

/data1/TTREC/datafile/o1_mf_system_cy46np73_.dbf

Lets remove the datafile.

SQL> host rm -f /data1/TTREC/datafile/o1_mf_system_cy46np73_.dbf

SQL> host ls -lrt /data1/TTREC/datafile/o1_mf_system_cy46np73_.dbf

ls: /data1/TTREC/datafile/o1_mf_system_cy46np73_.dbf: No such file or directory

SQL> alter system switch logfile;

System altered.

SQL>

SQL> /

System altered.

SQL>

SQL> /

alter system switch logfile

ERROR at line 1:

ORA-03113: end-of-file on communication channel

Process ID: 13353

Session ID: 232 Serial number: 41851

SQL> /

ERROR:

ORA-03114: not connected to ORACLE

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options

Could see that database got crashed.

[oracle@mydb01]:[TTREC1] $ cdtrace

[oracle@mydb01]:[TTREC1] $ tail -30f alert_TTREC1.log

Wed Sep 21 15:46:48 2016

Thread 1 cannot allocate new log, sequence 19

Checkpoint not complete

Current log# 3 seq# 18 mem# 0: /data1/TTREC/onlinelog/o1_mf_3_cwzoock3_.log

Current log# 3 seq# 18 mem# 1: /data2/TTREC/onlinelog/o1_mf_3_cwzoocvl_.log

Wed Sep 21 15:46:48 2016

Errors in file /u01/app/oracle/diag/rdbms/ttrec/TTREC1/trace/TTREC1_ckpt_22047.trc:

ORA-01243: system tablespace file suffered media failure

ORA-01116: error in opening database file 1

ORA-01110: data file 1: '/data1/TTREC/datafile/o1_mf_system_cy46np73_.dbf'

ORA-27041: unable to open file

Linux-x86_64 Error: 2: No such file or directory

Additional information: 3

Wed Sep 21 15:46:48 2016

Errors in file /u01/app/oracle/diag/rdbms/ttrec/TTREC1/trace/TTREC1_ckpt_22047.trc:

ORA-01243: system tablespace file suffered media failure

ORA-01116: error in opening database file 1

ORA-01110: data file 1: '/data1/TTREC/datafile/o1_mf_system_cy46np73_.dbf'

ORA-27041: unable to open file

Linux-x86_64 Error: 2: No such file or directory

Additional information: 3

Wed Sep 21 15:46:48 2016

USER (ospid: 22047): terminating the instance due to error 1243

Wed Sep 21 15:46:48 2016

System state dump requested by (instance=1, osid=22047 (CKPT)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/ttrec/TTREC1/trace/TTREC1_diag_21981_20160921154648.trc

Wed Sep 21 15:46:48 2016

Dumping diagnostic data in directory=[cdmp_20160921154648], requested by (instance=1, osid=22047 (CKPT)), summary=[abnormal instance termination].

Wed Sep 21 15:46:48 2016

Instance terminated by USER, pid = 22047

[oracle@mydb01]:[TTREC1] $ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Sep 21 15:47:34 2016

Connected to an idle instance.

SQL> startup mount;

ORACLE instance started.

Total System Global Area 2147483648 bytes

Fixed Size 3712904 bytes

Variable Size 1996490872 bytes

Database Buffers 134217728 bytes

Redo Buffers 13062144 bytes

Database mounted.

SQL> /

FILE# ONLINE ONLINE_ ERROR CHANGE# TIME CON_ID

---------- ------- ------- -------------------- ---------- --------- ----------

1 ONLINE ONLINE FILE NOT FOUND 0 0

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options

[oracle@mydb01]:[TTREC1] $ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 21 15:48:49 2016

connected to target database: TTREC (DBID=4201939161, not open)

RMAN> report schema;

using target database control file instead of recovery catalog

Report of database schema for database with db_unique_name TTREC

List of Permanent Datafiles

===========================

File Size(MB) Tablespace RB segs Datafile Name

---- -------- -------------------- ------- ------------------------

1 0 SYSTEM *** /data1/TTREC/datafile/o1_mf_system_cy46np73_.dbf

2 1640 SYSAUX *** /data1/TTREC/datafile/o1_mf_sysaux_cvf5jnh9_.dbf

3 1265 UNDOTBS1 *** /data1/TTREC/datafile/o1_mf_undotbs1_cvf5npon_.dbf

4 4341 USERS *** /data1/TTREC/datafile/o1_mf_users_cvf5oss4_.dbf

5 50 ORINON_TS *** /data1/TTREC/datafile/o1_mf_orinon_t_cvf5lomr_.dbf

6 546 ORAION_TS *** /data1/TTREC/datafile/o1_mf_oraion_t_cvf5jnjb_.dbf

7 10 TBS_TEST *** /data1/TTREC/datafile/o1_mf_tbs_test_cvf5ostw_.dbf

8 1000 ORION_DATA *** /data1/TTREC/datafile/o1_mf_orion_da_cvf5npp4_.dbf

9 50 TESTA *** /data1/TTREC/datafile/o1_mf_testa_cxf1h6kw_.dbf

10 50 TESTB *** /data1/TTREC/datafile/o1_mf_testb_cxnhzbbf_.dbf

List of Temporary Files

=======================

File Size(MB) Tablespace Maxsize(MB) Tempfile Name

---- -------- -------------------- ----------- --------------------

1 50 TEMP 50 /data1/TTREC/datafile/o1_mf_temp_cy40hxov_.tmp

RMAN> exit

Recovery Manager complete.

[oracle@mydb01]:[TTREC1] $

Now we will restore our system tablespace to a different location. As I want to use OMF for storing my datafiles – I will set alter system set db_create_file_dest='/data3' followed by set newname for datafile 1 to new from rman run prompt,Where /data3 is the new storage location.

In case if you are not using OMF you can use following syntax.

set newname for datafile 1 to ‘/newlocatoin’;

Its important to do swith datafile all before doing recovery as its update the controlfile with the new location of datafile.

[oracle@mydb01]:[TTREC1] $ ls -ld /data3

drwxr-xr-x 3 oracle oinstall 4096 Sep 21 15:25 /data3

[oracle@mydb01]:[TTREC1] $ cd /data3

[oracle@mydb01]:[TTREC1] $ ll

total 0

[oracle@mydb01]:[TTREC1] $ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 21 15:49:51 2016

connected to target database: TTREC (DBID=4201939161, not open)

RMAN> run

{

alter system set db_create_file_dest='/data3';

set newname for datafile 1 to new;

restore tablespace system;

switch datafile all;

recover tablespace system;

alter database open;

}

2> 3> 4> 5> 6> 7> 8> 9>

using target database control file instead of recovery catalog

Statement processed

executing command: SET NEWNAME

Starting restore at 21-SEP-16

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=365 device type=DISK

allocated channel: ORA_DISK_2

channel ORA_DISK_2: SID=398 device type=DISK

allocated channel: ORA_DISK_3

channel ORA_DISK_3: SID=431 device type=DISK

allocated channel: ORA_DISK_4

channel ORA_DISK_4: SID=464 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00001 to /data3/TTREC/datafile/o1_mf_system_%u_.dbf

channel ORA_DISK_1: reading from backup piece /t4_nfs/TTREC/database_0argcdi5_1_1

channel ORA_DISK_1: piece handle=/t4_nfs/TTREC/database_0argcdi5_1_1 tag=TAG20160921T153948

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:36

Finished restore at 21-SEP-16

datafile 1 switched to datafile copy

input datafile copy RECID=4 STAMP=923154658 file name=/data3/TTREC/datafile/o1_mf_system_cy47wz7v_.dbf

Starting recover at 21-SEP-16

using channel ORA_DISK_1

using channel ORA_DISK_2

using channel ORA_DISK_3

using channel ORA_DISK_4

starting media recovery

media recovery complete, elapsed time: 00:00:00

Finished recover at 21-SEP-16

Statement processed

RMAN> exit

Recovery Manager complete.

[oracle@mydb01]:[TTREC1] $ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Sep 21 15:51:37 2016

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options

SQL> select name from v$datafile ;

NAME

------------------------------------------------------

/data3/TTREC/datafile/o1_mf_system_cy47wz7v_.dbf

/data1/TTREC/datafile/o1_mf_sysaux_cvf5jnh9_.dbf

/data1/TTREC/datafile/o1_mf_undotbs1_cvf5npon_.dbf

/data1/TTREC/datafile/o1_mf_users_cvf5oss4_.dbf

/data1/TTREC/datafile/o1_mf_orinon_t_cvf5lomr_.dbf

/data1/TTREC/datafile/o1_mf_oraion_t_cvf5jnjb_.dbf

/data1/TTREC/datafile/o1_mf_tbs_test_cvf5ostw_.dbf

/data1/TTREC/datafile/o1_mf_orion_da_cvf5npp4_.dbf

/data1/TTREC/datafile/o1_mf_testa_cxf1h6kw_.dbf

/data1/TTREC/datafile/o1_mf_testb_cxnhzbbf_.dbf

10 rows selected.

If our /data1 storage is recoverd , we can simpley move the datafile online(in 12c).

SQL> show parameter create;

NAME TYPE VALUE

------------------------------------ ----------- --------

create_bitmap_area_size integer 8388608

create_stored_outlines string

db_create_file_dest string /data3

db_create_online_log_dest_1 string /data1

db_create_online_log_dest_2 string /data2

db_create_online_log_dest_3 string

db_create_online_log_dest_4 string

db_create_online_log_dest_5 string

SQL>

SQL> alter session set db_create_file_dest='/data1';

Session altered.

SQL> alter database move datafile 1;

Database altered.

SQL> select name from v$datafile ;

NAME

----------------------------------------------------------

/data1/TTREC/datafile/o1_mf_system_cy4c4zg0_.dbf

/data1/TTREC/datafile/o1_mf_sysaux_cvf5jnh9_.dbf

/data1/TTREC/datafile/o1_mf_undotbs1_cvf5npon_.dbf

/data1/TTREC/datafile/o1_mf_users_cvf5oss4_.dbf

/data1/TTREC/datafile/o1_mf_orinon_t_cvf5lomr_.dbf

/data1/TTREC/datafile/o1_mf_oraion_t_cvf5jnjb_.dbf

/data1/TTREC/datafile/o1_mf_tbs_test_cvf5ostw_.dbf

/data1/TTREC/datafile/o1_mf_orion_da_cvf5npp4_.dbf

/data1/TTREC/datafile/o1_mf_testa_cxf1h6kw_.dbf

/data1/TTREC/datafile/o1_mf_testb_cxnhzbbf_.dbf

10 rows selected.

SQL>

Datafile is successfully moved back to its original location.

Saturday, September 17, 2016

Clusterware failure scenarios a study note

Today I have gone through a very interesting video that demonstrate different clusterware failure scenarios with practical example. Here is the link https://www.youtube.com/watch?v=20qgRJEFC7w

Here is the abstract of this video

1. Node failure :-

CRS Resource will go to offline mode.

Instance recovery fill be performed by surviving node.

Node VIP and SCAN VIP will fail over to surviving node.

SCAN Listener will fail over to surviving node.

Client connection will move to surviving node If you used TAF at client side.

Resource like node listner ,ASM , database etc will be in offline mode for failed node.

When you bring up the failed node all resource that were failed over to the surviving node will relocate back to the original node.

CRS Resources that were in offline mode might get started automatically based on the AUTO_START setting.

Possible values of AUTO_START are,

always— Causes the resource to restart when the node restarts regardless of the resource's state when the node stopped.

restore— Does not start the resource at restart time if it was in an offline state, such as STATE=OFFLINE, TARGET=OFFLINE, when the node stopped. The resource is restored to its state when the node went down. The resource is started only if it was online before and not otherwise.

never— Oracle Clusterware never restarts the resource regardless of the resource's state when the node stopped.

2. Instance failure:-

Instance recovery will be performed by surviving instance.

Surviving instance reads the online redo log files of the failed instance and ensure that committed transaction are recorded to the database.

If all nodes fail, one instance will performe the recovery of all instances.

Services will be moved to available instance.

Clients connection will be moved to surviving instance if TAF is in place.

Failed instance may be restarted by the clusterware automatically.

You could see something like following in alertlog file from the survining node.

3. ASM instance failure:-

ASM instance will be OFFLINE and will be automatically restarted by the clusterware.

Instance recovery will be performed.

Client connection will be moved to surviving instance if TAF is in place.

Services will be moved to available instance.

4. Local and SCAN Listener failure:-

Listener failure will be detected by CRSD and restarted.

5. Public network failure:-

Node VIP and SCAN VIP will fail over to surviving node.

DB instance will be up and DB service will fail to surviving node.

6. Private network failure.

Up-to 11g R2 CSSD will detects a split-brain situation and will survive the node with the lowest node number. For example In case of two node cluster,Second node will be evicted.

However, starting from Oracle Database 12.1.0.2c, the node with higher weight will survive during split brain resolution. If you want to know about this, visit http://allthingsoracle.com/split-brain-whats-new-in-oracle-database-12-1-0-2c/

Clusterware will try to perform a reboot-less node fencing by cleaning-up CRSD resources ,if not possible a node reboot will happen.

OHAS will be up and running on the affected node , but CRSD,CSSD,EVMD,HAIP etc will be in OFFLINE mode.

As soon as private network is started, CSSD, CRSD and EVMD services start immediately on second node and joins the cluster.

Consider you have a two node 11gr2 cluster and you removed private interconnect cable from the first node. Whic node will get evicted i?

obviously the second node as oracle will survive the node with loweset node number in 11gr2.

Ie, in 11g it doesnt matter private network of first or second node fails it always evict the second node.

7. OCR and Voting disk failure:-

OCR

As best practice make sure you set two different disk groups to store OCR- one for main storage and second for OCR mirror.

Lets suppose you have two disk group (DATA & OCR) for storing OCR and both have two failgroup.

If you lose entire OCR disk grop and one failgroup within the DATA disk group, Clusterware will works as normal,but what happen if you lose the surviving failgroup in the main disk group too?

You have two option

- Restore OCR from backup.

- Restore OCR without backup.

How to create a OCR mirror ?

1. create the disk group with sufficinet failgroup ,Set diskgroup compatible etc from first node..

2. mount the disk from the second node.

3. make a mirror copy of OCR in newly created disk group by - ocrconfig -add '+NEWOCR' (as root)

4. verify the OCR mirror copy by - ocrcheck

Voting Disk

EXTERNAL REDUNDANCY - CSSD choose one ASM DISK and create one (1) voting disk.

NORMAL REDUNCANCY - CSSD choose 3 ASM DISK from different FAILGROUP and create 3 VOTING DISK, one in each ASM DISK/FAILGROUP

HIGH REDUNDANCY - CSSD choose 5 ASM DISK from different FAILGROUP and create 5 VOTING DISK, one in each ASM DISK/FAILGROUP.

In order to be able to tolerate a failure of n voting disk files , one must have at least 2n+1 configured.

Move/put voting disk in a new disk group

1. create the disk group with sufficinet failgroup ,Set diskgroup compatible etc from first node..

2. mount the disk from the second node.

3. Put the voting diks in newly created diskgroup - crsctl replace votedisk +NEWDG

4. Verify the new locaton of voting disk by - crsctl query css votedisk

You drop an ASM Disk that contains a Voting Disk, what will happen ? Oracle will silently move the Voting Disk to another ASM Disk of that Diskgroup (if exists),otherwise the command will ignored.

Reference:- https://www.youtube.com/watch?v=20qgRJEFC7w

Nothing but my learning and experience with Oracle..!

Pages

Tuesday, September 20, 2016

Recovering from the loss of a system tablespace to a differnet location in OMF

Saturday, September 17, 2016

Clusterware failure scenarios a study note

Popular Posts