Tuesday, August 30, 2016

Recovering After Loss of All Members of the INACTIVE Redo Log Group


You’re attempting to open your database and receive error message like following.

Errors in file /u01/app/oracle/diag/rdbms/ttrec/TTREC1/trace/TTREC1_arc0_3389.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 3 thread 1: '/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Tue Aug 30 17:33:35 2016
Master archival failure: 313

Above message indicates that two members of an online redo log group 3 have experienced a media failure.

I am simulating this scenario by deleting all members of a inactive redo log group.

SQL> @redo.sql

MEMBER                                                                            GROUP# STATUS           STATUS
----------------------------------------------------------------------------- ---------- ---------------- -------
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                           1 INACTIVE
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                           1 INACTIVE
/data1/TTREC/onlinelog/o1_mf_2_cw7yyz4t_.log                                           2 CURRENT
/data2/TTREC/onlinelog/o1_mf_2_cw7yyz8w_.log                                           2 CURRENT
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                           3 INACTIVE
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                           3 INACTIVE

6 rows selected.

First of all you have to understand that an INACTIVE redo log group is not needed for crash recovery and it may or may not have been archived.So I just want to make sure that none of my ongoing operations are affected by losing an inactive redo log group – So I created a simple table and insert some entry into it.

SQL> create table test (id number );

Table created.


SQL> insert into test values (111);

1 row created.

SQL> commit;

Commit complete.

SQL> @redo

MEMBER                                                                            GROUP# STATUS           STATUS
----------------------------------------------------------------------------- ---------- ---------------- -------
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                           1 INACTIVE
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                           1 INACTIVE
/data1/TTREC/onlinelog/o1_mf_2_cw7yyz4t_.log                                           2 CURRENT
/data2/TTREC/onlinelog/o1_mf_2_cw7yyz8w_.log                                           2 CURRENT
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                           3 INACTIVE
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                           3 INACTIVE

6 rows selected.

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options

I deleted my redo log group 3, which is in INACTIVE state.

[oracle@mydb01]:[TTREC1] $ rm /data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log
[oracle@mydb01]:[TTREC1] $ rm /data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log

At this moment alert log file won’t display anything about deleted redo log files and it will continue its operation until oracle tries to access the deleted files. I did a log switch.

[oracle@mydb01]:[TTREC1] $ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Aug 30 17:33:19 2016
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options

SQL>
SQL> alter system switch logfile ;

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options

I could see following error in my alert log file

[oracle@mydb01]:[TTREC1] $ alert
Master archival failure: 313
Tue Aug 30 17:33:35 2016
Errors in file /u01/app/oracle/diag/rdbms/ttrec/TTREC1/trace/TTREC1_arc0_3389.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 3 thread 1: '/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Tue Aug 30 17:33:35 2016
Master archival failure: 313

Now oracle started complaining about deleted redo log group – Note that oracle still continue its operation without any fail. If all members of an online redo log group are damaged, you won’t be able to open your database. In this situation, Oracle will allow you to only mount your database.So I bounced my database to simulate that scenario.

[oracle@mydb01]:[TTREC1] $
[oracle@mydb01]:[TTREC1] $ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Tue Aug 30 17:33:46 2016
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options

SQL>
SQL> select name ,open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
TTREC     READ WRITE

SQL> shut immediate ;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup;
ORACLE instance started.

Total System Global Area 2147483648 bytes
Fixed Size                  3712904 bytes
Variable Size            1996490872 bytes
Database Buffers          134217728 bytes
Redo Buffers               13062144 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 14470
Session ID: 232 Serial number: 16835


SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics and Real Application Testing options

And my alert log file shows

[oracle@mydb01]:[TTREC1] $ alert
Errors in file /u01/app/oracle/diag/rdbms/ttrec/TTREC1/trace/TTREC1_lgwr_14373.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00312: online log 3 thread 1: '/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Tue Aug 30 17:35:55 2016

So I started my database in mount mode.

[oracle@mydb01]:[TTREC1] $ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Tue Aug 30 17:36:13 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 2147483648 bytes
Fixed Size                  3712904 bytes
Variable Size            1996490872 bytes
Database Buffers          134217728 bytes
Redo Buffers               13062144 bytes
Database mounted.
SQL> set lines 4444
SQL> col member for a77
SQL> select member, a.group#, a.status, b.status, a.archived  from v$log a, v$logfile b where  a.group# = b.group# order by a.group#, member;

MEMBER                                                                            GROUP# STATUS           STATUS  ARC
----------------------------------------------------------------------------- ---------- ---------------- ------- ---
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                           1 INACTIVE                 NO
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                           1 INACTIVE                 NO
/data1/TTREC/onlinelog/o1_mf_2_cw7yyz4t_.log                                           2 CURRENT                  NO
/data2/TTREC/onlinelog/o1_mf_2_cw7yyz8w_.log                                           2 CURRENT                  NO
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                           3 INACTIVE                 NO
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                           3 INACTIVE                 NO

6 rows selected.

As  the status of redo log group 3 is  INACTIVE, then this log group is no longer needed for crash Therefore, you can use the clear logfile command to re-create all members of a log group. The following example re-creates all log members of group 3.

SQL> alter database clear logfile group 3;
alter database clear logfile group 3
*
ERROR at line 1:
ORA-00350: log 3 of instance TTREC1 (thread 1) needs to be archived
ORA-00312: online log 3 thread 1: '/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log'
ORA-00312: online log 3 thread 1: '/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log'


SQL> alter database clear unarchived logfile group 3;

Database altered.

SQL>

You could see following message in the alert log file -

Tue Aug 30 17:37:46 2016
WARNING! CLEARING REDO LOG WHICH HAS NOT BEEN ARCHIVED. BACKUPS TAKEN
    BEFORE 08/30/2016 17:33:27 (CHANGE 131509533) CANNOT BE USED FOR RECOVERY.
Clearing online log 3 of thread 1 sequence number 54
Tue Aug 30 17:37:46 2016

Note:- 

1.As the warning makes clear, you are now missing a redo log file.  You will not be able to roll forward through redo from prior backups. You are advised to make a fresh backup of the database and archivelogs as soon as possible.ie, If the online redo log group has not been archived, then it may be required for media recovery and not for a crash recovery.

2. alter database clear logfile command will recreate the physical file at OS level.

SQL>  select member, a.group#, a.status, b.status, a.archived  from v$log a, v$logfile b where  a.group# = b.group# order by a.group#, member;

MEMBER                                                                            GROUP# STATUS           STATUS  ARC
----------------------------------------------------------------------------- ---------- ---------------- ------- ---
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                           1 INACTIVE                 NO
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                           1 INACTIVE                 NO
/data1/TTREC/onlinelog/o1_mf_2_cw7yyz4t_.log                                           2 CURRENT                  NO
/data2/TTREC/onlinelog/o1_mf_2_cw7yyz8w_.log                                           2 CURRENT                  NO
/data1/TTREC/onlinelog/o1_mf_3_cwbfybx9_.log                                           3 UNUSED                   YES
/data2/TTREC/onlinelog/o1_mf_3_cwbfyc2b_.log                                           3 UNUSED                   YES

6 rows selected.

SQL> !host ls -lrt /data1/TTREC/onlinelog/o1_mf_3_cwbfybx9_.log
host: couldn't get address for '-lrt': failure

SQL> host ls -lrt /data1/TTREC/onlinelog/o1_mf_3_cwbfybx9_.log
-rw-r----- 1 oracle asmadmin 104858112 Aug 30 17:37 /data1/TTREC/onlinelog/o1_mf_3_cwbfybx9_.log

SQL> host ls -lrt /data2/TTREC/onlinelog/o1_mf_3_cwbfyc2b_.log
-rw-r----- 1 oracle asmadmin 104858112 Aug 30 17:37 /data2/TTREC/onlinelog/o1_mf_3_cwbfyc2b_.log

SQL> alter database open;

Database altered.

SQL> select * from test;

        ID
----------
       111

SQL>

Our data is intact after recovering the lost redo log file group. :) 

Restoring After Losing One Member of the Multiplexed redo log Group - in a OMF and NON-OMF system


Restoring After Losing One Member of the Multiplexed Group

If you are seeing an ORA-00312 in your alert log it should be because of one of the member of the redo log group is experiencing media failure. You should see lines similar to these near the bottom of your alert.log file:


Mon Aug 29 18:28:02 2016
Errors in file /u01/app/oracle/diag/rdbms/ttrec/TTREC1/trace/TTREC1_arc1_1752.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3


If your online redo log file members are multiplexed, the log writer(lgwr) will continue to function as long as it can successfully write to one member of the current log group. If the problem is temporary, then as soon as the online redo log file becomes available, the log writer will start to write to the online redo log file as if there was never an issue.

If the media failure is permanent then you’ll need to replace the disk and drop and re-create the bad member to its original location. If you don’t have the option of
replacing the bad disk, then you’ll need to drop the bad member and re-create it in an alternate location.

For permanent media failures, here are the instructions for dropping and re-creating one member of an online redo log group:

1. Identify the online redo log file experiencing media failure.
2. Ensure that the online redo log file is not part of the current online log group.
3. Drop the damaged member.
4. Add a new member to the group.

Here I am going to delete one of the multiplexed redo log file and later I will recreate it in the same location.

SQL>select member, a.group#, a.status, b.status from v$log a, v$logfile b where  a.group# = b.group#  order by a.group#, member;

MEMBER                                                                     GROUP# STATUS           STATUS
---------------------------------------------------------------------- ---------- ---------------- -------
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                    1 ACTIVE
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                    1 ACTIVE
/data1/TTREC/onlinelog/o1_mf_2_cvf5qv2z_.log                                    2 CURRENT
/data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log                                    2 CURRENT
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                    3 ACTIVE
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                    3 ACTIVE

6 rows selected.

SQL>

I deleted one of the current redo log member.

[oracle@ehrsvpitoel02]:[TTREC1] $ rm /data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log

I did a log switch

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> exit

I could see following error message in my alert log file.

Mon Aug 29 18:28:02 2016
Errors in file /u01/app/oracle/diag/rdbms/ttrec/TTREC1/trace/TTREC1_arc1_1752.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

I just checked the status of my redo log file -

[oracle@ehrsvpitoel02]:[TTREC1] $ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Mon Aug 29 18:28:17 2016
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing options

SQL> @redo

MEMBER                                                                     GROUP# STATUS           STATUS
---------------------------------------------------------------------- ---------- ---------------- -------
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                    1 CURRENT
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                    1 CURRENT
/data1/TTREC/onlinelog/o1_mf_2_cvf5qv2z_.log                                    2 ACTIVE
/data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log                                    2 ACTIVE
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                    3 ACTIVE
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                    3 ACTIVE

6 rows selected.

Still the  views is  not giving any information regarding lost member(I should expect a INVALID status on v$logfile.status column. So I tried a database bounce as I want to see whether the column is reflecting properly after a bounce – this is not really required though.

SQL> shut immediate ;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup;
ORACLE instance started.
Total System Global Area 2147483648 bytes
Fixed Size                  3712904 bytes
Variable Size            1996490872 bytes
Database Buffers          134217728 bytes
Redo Buffers               13062144 bytes
Database mounted.
Database opened.
SQL> @redo

MEMBER                                                                     GROUP# STATUS           STATUS
---------------------------------------------------------------------- ---------- ---------------- -------
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                    1 CURRENT
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                    1 CURRENT
/data1/TTREC/onlinelog/o1_mf_2_cvf5qv2z_.log                                    2 INACTIVE
/data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log                                    2 INACTIVE         INVALID
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                    3 INACTIVE
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                    3 INACTIVE

6 rows selected.

SQL>

After restart, its clear that database is complaining about missed redo log file – I am not sure why this view is not populating at run time. Now I will drop and recreate the INVALID redo log file.

SQL> alter database drop logfile member '/data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log';

Database altered.

Note:- You can drop and add online redo log file members while your database is in either a mounted state or an open state. If you attempt to drop a member of a current log group, Oracle will throw an ORA-01609 error specifying that the log is current and you cannot drop one of its members. If the failed member is in the current log group, then use the alter system switch logfile command to make the next group the current group.

SQL> @redo

MEMBER                                                                     GROUP# STATUS           STATUS
---------------------------------------------------------------------- ---------- ---------------- -------
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                    1 CURRENT
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                    1 CURRENT
/data1/TTREC/onlinelog/o1_mf_2_cvf5qv2z_.log                                    2 INACTIVE
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                    3 INACTIVE
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                    3 INACTIVE



SQL> alter database add logfile member  '/data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log' to group 2;
alter database add logfile member  '/data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log' to group 2
*
ERROR at line 1:
ORA-01276: Cannot add file /data2/TTREC/onlinelog/o1_mf_2_cvf5qv5t_.log.  File has an Oracle Managed Files file name.

SQL>



SO if your redo logs are configured using OMF,  you have to drop the entire redo log group and then recreate it .

SQL> ALTER DATABASE DROP LOGFILE GROUP 2;

Database altered.

SQL>  ALTER DATABASE add LOGFILE GROUP 2;

Database altered.

SQL> @redo

MEMBER                                                                     GROUP# STATUS           STATUS
---------------------------------------------------------------------- ---------- ---------------- -------
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                    1 CURRENT
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                    1 CURRENT
/data1/TTREC/onlinelog/o1_mf_2_cw7yyz4t_.log                                    2 UNUSED
/data2/TTREC/onlinelog/o1_mf_2_cw7yyz8w_.log                                    2 UNUSED
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                    3 INACTIVE
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                    3 INACTIVE

6 rows selected.

Once the newly created member becomes part of the CURRENT log group, its status should change to NULL. A NULL member status indicates that the database is using the online redo log file.

SQL> alter system switch logfile ;

System altered.

SQL>
SQL> /

System altered.

SQL>

SQL> @redo

MEMBER                                                                     GROUP# STATUS           STATUS
---------------------------------------------------------------------- ---------- ---------------- -------
/data1/TTREC/onlinelog/o1_mf_1_cw7vjldp_.log                                    1 ACTIVE
/data2/TTREC/onlinelog/o1_mf_1_cw7vjljy_.log                                    1 ACTIVE
/data1/TTREC/onlinelog/o1_mf_2_cw7yyz4t_.log                                    2 ACTIVE
/data2/TTREC/onlinelog/o1_mf_2_cw7yyz8w_.log                                    2 ACTIVE
/data1/TTREC/onlinelog/o1_mf_3_cw7ogdtj_.log                                    3 CURRENT
/data2/TTREC/onlinelog/o1_mf_3_cw7ogdyv_.log                                    3 CURRENT

6 rows selected.

SQL>

Recovering After Losing One Member of the Multiplexed Group (in non-omf)

Search for ORA-00312 in your database alert log file that identifies which member of the log group is experiencing media failure.

Once you’ve identified the bad online redo log file, execute the following query to check whether that online redo log file’s group has a CURRENT status:

SQL> select  a.group#, a.thread#,  a.status grp_status, b.member member, b.status mem_status from v$log a, v$logfile b where a.group# = b.group# order by a.group#, b.member;

If the failed member is in the current log group, then use the alter system switch
logfile command to make the next group the current group. Then drop the failed member as follows:
SQL> alter database drop logfile member '<\directory\member>';

Then re-create the online redo log file member:
SQL> alter database add logfile member '<\new directory\member>' to group <group#>;

If an unused log file already happens to exist in the target location, you can use the reuse parameter to overwrite and reuse that log file. The log file must be the same size as the other log files in the group.
SQL> alter database add logfile member '\directory\member>' reuse to group <group#>;


Thursday, August 18, 2016

How to invoke a shell script from a windows batch file

From  last few days I was  searching for a script that can invoke a shell script from windows batch file.After a prolonged search on google I made it. So thought of sharing here -

Create a text file with some name , for example Check_Service and save it as .bat. The content of my Check_Servic.bat is,

C:\Users\mahesh\Desktop\PT\putty.exe -ssh oracle@10.81.155.9 -pw Myoracle123 -m "C:\Users\mahesh\Desktop\PT\work3.txt" -t

And the content of  C:\Users\mahesh\Desktop\PT\work3.txt  is here ,


sh /home/oracle/mp/srvctl_status_db_DI.sh
/bin/bash 


And the content of my srvctl_status_db_DI.sh


for database in DIOES DIIAM DIHIM
do
export ORACLE_SID=${database}1
export ORAENV_ASK=NO
. oraenv >/dev/null 2>&1
srvctl status database -d $database
echo ------------------------------------------------------
done


Its a simple script to check the status of your cluster database . 

Double click the batch file -