Nothing but my learning and experience with Oracle..!: RAC & ASM

Showing posts with label RAC & ASM. Show all posts

Saturday, March 26, 2016

Steps involved in node addition in oracle RAC

At present :

Two node RAC with nodes as (sa-aplha and sa-beta) Hosting a database MYDB with instance as (MYDB1 and MYDB2)

Aim:

Add new node sa-gamma to this two node cluster and to create MYDB3 on the new node sa-gama.

Prerequisite:-

1. OS certification , compatibility and patchset levels should match with other nodes in the cluster

2. All the packages required by Oracle should be installed on new node.

3. If you are using ASMLIB, appropriate binaries should be installed on new node ,

4. OS limits should match with other nodes (/etc/security/limit.conf)

5. OS kernel parameter in place with other nodes (/etc/sysctl.conf)

6. DNS configuration should be match with other nodes -

Check the contents of /etc/resolve.conf and compare with other nodes .

if you don't have a DNS for name resolution , make sure you have similar /etc/hosts file across all nodes .

7. Configure auto-SSH connectivity for grid user among all 3 nodes

8. Time should be synchronized across all nodes in the cluster , if there is a problem try

for ntpd restart on each node. From oracle 11g onwards we can use CTSS

instead of relying on operating system NTP. So if your cluster is configured
with CTSS , you need to de-configure ntpd on the new node

    #service ntpd stop
    #chkconfig ntpd off
    #mv /etc/ntp.conf /etc/ntp.conf.original
    #rm /var/run/ntpd.pid

9.Create Job Role Separation Operating System Privileges Groups, Users, and Directories.
While creating the new users make sure UID and the GID of oracle/grid are

identical to   that of the other RAC nodes.
    where oracle --> rdbms software owner
    grid --> grid infrastructure onwer

10.Configure Network Components:-
While configuring Network components make sure your public/private NIC must

have same name compared to other nodes in the cluster .
For example , in first and second node if eth0 is configured for public network

you have to choose the same NIC for configuring public network for the new node.
Note:- Public n/w is a rout-able network -- you have to give the default

gateway address while configuring the NIC for public n/w, but private n/w should

not be a rout-able n/w that is we don't configure default gateway for private n/w.

11.Configure Access to the Shared Storage:-
       Either contact your system administrator to make this task done , or try to accomplish it by yourself -
       if you are using iscsi (NAS) , after configuring the storage part do the following from the new node:-
                As root execute following commnads -->
                service iscsci restart
                lsscsi - should list the all available LUN
                /etc/init.d/oracleasm scandisks
                /etc/init.d/oracleasm listdisks
                it should list all ASM labeled disks

Verify New Node using cluvfy

Use CVU as the Oracle Grid Infrastructure owner to determine the integrity of the cluster and whether it is ready for the new Oracle RAC node to be added.
From the GRID_HOME as grid owner execute the following

$cluvfy stage -pre nodeadd -n sa-gamma -verbose

You can ignore the PRVF-5449 message if you have already configured and verified that the ASM shared disks are visible on the third node. The error was a result of having the voting disks stored in Oracle ASM storage, a new feature of Oracle 11g Release2.

if you have any ignorable error previously do the following before actually involking the addNode.sh script
export IGNORE_PREADDNODE_CHECKS=Y

Extend the GRID_HOME to the new node

$export IGNORE_PREADDNODE_CHECKS=Y
$cd $GRID_HOME/oui/bin
$ ./addNode.sh -silent "CLUSTER_NEW_NODES={sa-gamma}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={sa-gamma-vip}"

Basically it will copy & configure your GRID software to your remote node and end up up with asking to execute following script as root -- > orainstRoot.sh and root.sh

where root.sh is the main script that will create olr,it will update OCR , start all the clusterware doemon in third node along with ASM .

#$GRID_HOME/root.sh
Running Oracle 11g root script...
The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME= /u01/app/11.2.0.2/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0.2/grid/crs/install/crsconfig _params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful. OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node ayu2, number 2, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

Here you can see , the new node is getting joined to cluster .

Post verification :-
$cluvy stage -post nodeadd -n sa-gamma
------------------------------------------------------------------------------------------

Extend the RDBMS ORACLE_HOME to inlude the new node
From an existing node – As the database software owner – run the following command to extend the Oracle database software to the new node "sa-gama"

As oracle user

$cd $ORACLE_HOME/oui/bin

$./addNode.sh -silent "CLUSTER_NEW_NODES={sa-gamma}"

Note: remember this time addNode.sh is invoked from ORACLE_HOME/oui/bin .
It will copy and configure RDBMS software on the remote node and end up with asking to execute following script as root

#$ORACLE_HOME/root.sh

Add New Instance to Clustered Database--

Invoke DBCA from any running instance - select instance management -then choose the following .

Welcome: Select Oracle Real Application Clusters (RAC) database as the database type that you would like to create or administer.

Operations: Select Instance Management.

Instance Management: Select Add an instance.

List of cluster databases: Enter the sys username and password.

List of cluster database instances: Verify the existing database instances.

Instance naming and node selection: Verify the instance name (MYDB3) to be added and the node (sa-gama) on which to add the instance.

Instance Storage: Verify the storage configuration.

Summary: Verify the set up and click on OK.

Basically it will do following task for you -

-> add instance specific parameter to the spfile ,
-> add undotablespace for the third instance
-> add redo log gruop for thread 3
-> update the metadata of new instance in OCR:-

if you want to do it manually execute the commands from any of the running nodes

SQL> alter database add logfile thread 3 group 5 ('+DATADG','+FRADG') size 50M, group 6 ('+DATADG','+FRADG') size 50M;

Database altered.
SQL> alter system set thread=3 scope=spfile sid='MYDB3';
System altered.
SQL> alter database enable public thread 3;

Database altered.
SQL> create undo tablespace undotbs3 datafile '+DATADG' size 200M autoextend on;

Tablespace created.
SQL> alter system set undo_tablespace='undotbs3' scope=spfile sid='MYDB3';

System altered.

SQL> alter system set instance_number=3 scope=spfile sid='MYDB3';

System altered.

SQL> alter system set cluster_database_instances=3 scope=spfile sid='*';

System altered.

Update Oracle Cluster Registry (OCR)
The OCR will be updated to account for a new instance – "MYDB3" – being added to the "MYDB" cluster database . Add "MYDB3" instance to the "MYDB" database and verify
#srvctl config database -d MYDB

Reference:- https://www.youtube.com/watch?v=ZV_P5-qpLhs

Wednesday, November 11, 2015

How to apply an interim patch in RAC database .

we can follow the following practice to apply an interim patch(one- off patch ) in RAC database .

1.Before applying the interim patch in RAC nodes , first take the backup of ORACLE_HOME in each database nodes through tar command .

As root/oracle user take the backup of oracle home that we intend to patch

#tar -cvzf /bkp_location/oracle_12.1.0.2.tar.gz /u01/app/oracle/product/12.1.0.2/

To appy patch we need to login as oracle user

2.If the patch is a rolling one ,we can apply the patch either through local mode or in rolling patch mode.
How to check the patch is a rolling one ?

[oracle@pdc01]$ $ORACLE_HOME/OPatch/opatch query -is_rolling_patch /nfs1/BINARIES/DB_Patch/p20476175_121020_Linux-x86-64.zip
Oracle Interim Patch Installer version 12.1.0.1.8
Copyright (c) 2015, Oracle Corporation. All rights reserved.
Oracle Home       : /u01/app/oracle/product/12.1.0.2/db_1
Central Inventory : /u01/app/aoraInventory
   from           : /u01/app/oracle/product/12.1.0.2/db_1/oraInst.loc
OPatch version    : 12.1.0.1.8
OUI version       : 12.1.0.2.0
Log file location : /u01/app/oracle/product/12.1.0.2/db_1/cfgtoollogs/opatch/opatch2015-11-12_11-32-12AM_1.log
--------------------------------------------------------------------------------
Patch is a rolling patch: true
Patch is a FMW rolling patch: false
OPatch succeeded.
[oracle@pdc01]$
Or you can test it by following method,

3.Unzip the patch and go to patch directory

[oracle@pdc01]$unzip /nfs1/BINARIES/DB_Patch/p20476175_121020_Linux-x86-64.zip
[oracle@pdc01 20476175]$ /u01/app/oracle/product/12.1.0.2/db_1/OPatch/opatch query -all | grep -i "Need to shutdown Oracle instances"
Need to shutdown Oracle instances: true
[oracle@pdc01 20476175]$
[oracle@pdc01 20476175]$ /u01/app/oracle/product/12.1.0.2/db_1/OPatch/opatch query -all | grep -i "Patch is a rolling patch"
Patch is a rolling patch: true
[oracle@pdc01 20476175]$

4.Shut down all the services running from the Oracle home of the database of Oracle Database home. - please see the readme file for more information.

5.Apply the interim patch (one-off patch )
choose one of the following method to apply interim patches in RAC environment.

5.1   -local
              Patch the local node, then update inventory of the
              local node. Do not propagate the patch or inventory
              update to other nodes.

[oracle@pdc01 20476175]$/u01/app/oracle/product/12.1.0.2/db_1/OPatch/opatch apply -local

Repeat the steps in remaining nodes .

5.2 -all_nodes (Default)
if you don't specify -local along with opatch apply then opatch will patch node 1 first , once it finish with node1 , opatch will list the remaining nodes to patch . At this point we can select the remaining nodes and opatch will continue patching with remaining nodes .

[oracle@pdc01 20476175]$/u01/app/oracle/product/12.1.0.2/db_1/OPatch/opatch apply

Sunday, May 17, 2015

Convert a single instance database to 2 node RAC cluster database.

Here I am going to explain how to convert a single instance database to a two node RAC database .Keep in mind that, even if your source database is in RAC , there is no direct method to convert it into a clone RAC database . First we need to convert it into a single instance database , later we need to add the newly configured single instance to cluster.

1.First take the full database backup of the single instance database,

BACKUP AS COMPRESSED BACKUPSET FULL TAG 'DFWEBFULL' DATABASE FORMAT '/nfs_1/DB_ADHOC_BACKUPS/DFWEBFULL/Backup_%d_%T_%s'
INCLUDE CURRENT CONTROLFILE PLUS ARCHIVELOG format '/nfs_1/DB_ADHOC_BACKUPS/DFWEBFULL/ARCHIVE_%d_%T_%s';

Note:- change the permission of the backup directory so that oracle software owner can read those files , otherwise you wil receive error.
$cd /nfs_1/DB_ADHOC_BACKUPS/DFWEBFULL
$chmod 777 *
Here I took my rman backups into a nfs mount point , and its accessible from the auxiliary database so that no need to copy the entire backup set .

2. Take one init parameter from the Cluster ( if your cluster have already some instance configured) where you want to configure your RAC instances, otherwise make a new one .keep the locat_listener and remote_listener as it is . Change the other parameter accordingly .Here I am going to configure RAC database TTWEB having two instances TTWEB1 and TTWEB2. Here is my init parameter and is named as initTTWEB1.ora
Note:- Comment out *.cluster_database=TRUE
$vi initTTWEB1.ora
TTWEB2.__db_cache_size=956301312
TTWEB1.__db_cache_size=1610612736
TTWEB2.__java_pool_size=67108864
TTWEB1.__java_pool_size=67108864
TTWEB2.__large_pool_size=83886080
TTWEB1.__large_pool_size=83886080
TTWEB2.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
TTWEB1.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
TTWEB2.__pga_aggregate_target=1493172224
TTWEB1.__pga_aggregate_target=1493172224
TTWEB2.__sga_target=2801795072
TTWEB1.__sga_target=2801795072
TTWEB2.__shared_io_pool_size=0
TTWEB1.__shared_io_pool_size=0
TTWEB2.__shared_pool_size=1660944384
TTWEB1.__shared_pool_size=1006632960
TTWEB2.__streams_pool_size=0
TTWEB1.__streams_pool_size=0
*.audit_file_dest='/u01/app/oracle/admin/TTWEB/adump'
*.audit_trail='db'
#*.cluster_database=TRUE
*.compatible='11.2.0.4'
*.control_files='+ORADATA/TTWEB/control1.ctl','+ORAFRA/TTWEB/control2.ctl'
*.db_block_size=8192
*.db_create_file_dest='+ORADATA'
*.db_create_online_log_dest_1='+ORADATA'
*.db_create_online_log_dest_2='+ORAFRA'
*.db_domain=''
*.db_name='TTWEB'
*.db_recovery_file_dest='+ORAFRA'
*.db_recovery_file_dest_size=107374182400
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=TTWEBXDB)'
TTWEB1.instance_number=1
TTWEB2.instance_number=2
TTWEB1.local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=10.81.155.146)(PORT=1521))'
TTWEB2.local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=10.81.155.157)(PORT=1521))'
*.memory_target=4G
*.nls_language='AMERICAN'
*.open_cursors=300
*.pga_aggregate_target=536870912
*.processes=500
*.remote_listener='ptgracscan:1521'
*.remote_login_passwordfile='exclusive'
*.sga_target=1610612736
TTWEB1.thread=1
TTWEB2.thread=2
TTWEB1.undo_tablespace='UNDOTBS1'
TTWEB2.undo_tablespace='UNDOTBS2'
Copy initTTWEB1.ora into $ORACLE_HOME/dbs/ ( in node1)

3. Make adump directories on both node,
mkdir -p /u01/app/oracle/admin/TTWEB/adump (in both the nodes)

4. Edit /et/oratab (in both the node) ,
in node1
TTWEB1:/u01/app/oracle/product/11.2.0.4/db_1:N
TTWEB:/u01/app/oracle/product/11.2.0.4/db_1:N
in node2
TTWEB2:/u01/app/oracle/product/11.2.0.4/db_1:N
TTWEB:/u01/app/oracle/product/11.2.0.4/db_1:N
if you don't add 'TTWEB' in /etc/oratab file it will be automatically added by oraagent once
your instance are configured through srvctl command.

5.Set oraenv at node1 and start rman duplication,
[oracle@dbnode1]:[] $ . oraenv
ORACLE_SID = [oracle] ? TTWEB1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@dbnode1]:[TTWEB1] $sqlplus / as sysdba
SQL>startup nomount;
[oracle@dbnode1]:[TTWEB1] $rman auxiliary /
DUPLICATE DATABASE TO TTWEB BACKUP LOCATION '/nfs_1/DB_ADHOC_BACKUPS/DFWEBFULL';
Once the duplication finish , do the rest of the work as given below

6.Create spfile in one of the disk group,
SQL> create spfile='+ORADATA' from pfile;
File created.
SQL>
Check the location of the spfile created by
[oracle@dbnode1]:[TTWEB1] $ su - grid
Password:
[grid@dbnode1 ~]$. oraenv
ORACLE_SID = [grid] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[grid@dbnode1 ~]$ asmcmd
ASMCMD> cd ORADATA
ASMCMD> ls
ASMCMD> cd TTWEB/
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
control1.ctl
spfileTTWEB.ora
ASMCMD> cd PARAMETERFILE/
ASMCMD> ls
spfile.1465.871230913
ASMCMD> ls -lt
Type Redund Striped Time Sys Name
PARAMETERFILE UNPROT COARSE FEB 09 16:00:00 Y spfile.1465.871230913
ASMCMD> pwd

7.Make an entry in initTTWEB1.ora as,
spfile='+ORADATA/TTWEB/PARAMETERFILE/spfile.1465.871230913'
--- note down any of the spfile name from the above output
SQL> shut immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup;
ORACLE instance started.
Total System Global Area 4275781632 bytes
Fixed Size 2260088 bytes
Variable Size 2315256712 bytes
Database Buffers 1946157056 bytes
Redo Buffers 12107776 bytes
Database mounted.
Database opened.
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string +ORADATA/ttweb/parameterfile/s
pfile.1465.871230913
SQL>

8. Add undo tablespace for second instance,
SQL> create undo tablespace UNDOTBS2 datafile '+ORADATA' size 100M;
Tablespace created.
SQL>

9. Add redo log groups for the second instance , for that we have to verify how redo logs are configured for first instance.
SQL> set lines 300
col member for a80
SELECT a.group#,b.thread#, a.member, b.bytes/1024/1024 FROM v$logfile a, v$log b WHERE a.group# = b.group#;
GROUP# THREAD# MEMBER B.BYTES/1024/1024
---------- ---------- -------------------------------------------------------------------------------- -----------------
3 1 +ORADATA/ttweb/onlinelog/group_3.1460.871230663 50
3 1 +ORAFRA/ttweb/onlinelog/group_3.1775.871230663 50
2 1 +ORADATA/ttweb/onlinelog/group_2.1459.871230661 50
2 1 +ORAFRA/ttweb/onlinelog/group_2.9378.871230661 50
1 1 +ORADATA/ttweb/onlinelog/group_1.1458.871230661 50
1 1 +ORAFRA/ttweb/onlinelog/group_1.464.871230661 50
6 rows selected.
SQL>
Here we have 3 redo log groups having two members each , so we will configure redo groups 4,5 and 6 for the second instance.

10. Add redo log groups for the second instance,
alter database add logfile thread 2 group 4 ('+ORADATA','+ORAFRA') size 50m reuse;
alter database add logfile thread 2 group 5 ('+ORADATA','+ORAFRA') size 50m reuse;
alter database add logfile thread 2 group 6 ('+ORADATA','+ORAFRA') size 50m reuse;
SQL> SELECT a.group#,b.thread#, a.member, b.bytes/1024/1024 FROM v$logfile a, v$log b WHERE a.group# = b.group#;
GROUP# THREAD# MEMBER B.BYTES/1024/1024
---------- ---------- -------------------------------------------------------------------------------- -----------------
3 1 +ORADATA/ttweb/onlinelog/group_3.1460.871230663 50
3 1 +ORAFRA/ttweb/onlinelog/group_3.1775.871230663 50
2 1 +ORADATA/ttweb/onlinelog/group_2.1459.871230661 50
2 1 +ORAFRA/ttweb/onlinelog/group_2.9378.871230661 50
1 1 +ORADATA/ttweb/onlinelog/group_1.1458.871230661 50
1 1 +ORAFRA/ttweb/onlinelog/group_1.464.871230661 50
4 2 +ORADATA/ttweb/onlinelog/group_4.1468.871234269 50
4 2 +ORAFRA/ttweb/onlinelog/group_4.1825.871234269 50
5 2 +ORADATA/ttweb/onlinelog/group_5.1469.871234297 50
5 2 +ORAFRA/ttweb/onlinelog/group_5.7489.871234297 50
6 2 +ORADATA/ttweb/onlinelog/group_6.1470.871234303 50
GROUP# THREAD# MEMBER B.BYTES/1024/1024
---------- ---------- -------------------------------------------------------------------------------- -----------------
6 2 +ORAFRA/ttweb/onlinelog/group_6.8822.871234303 50
12 rows selected.
Now we have second thread of online redo logs in order to start instance 2 and we can enable thread 2.
SQL> alter database enable public thread 2;
Database altered.
SQL>

11. Set cluster_database to true,before starting the SRVCTL utility
cluster_database is a static parameter and it require to bounce the database
SQL> show parameter cluster
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cluster_database boolean FALSE
cluster_database_instances integer 1
cluster_interconnects string
SQL>
SQL> alter system set cluster_database=true scope=spfile sid='*';
System altered.
SQL> shut immediate;
sDatabase closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup;
ORACLE instance started.
Total System Global Area 4275781632 bytes
Fixed Size 2260088 bytes
Variable Size 2399142792 bytes
Database Buffers 1862270976 bytes
Redo Buffers 12107776 bytes
Database mounted.
Database opened.
SQL> show parameter cluster
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cluster_database boolean TRUE
cluster_database_instances integer 2
cluster_interconnects string
SQL>

12. Copy initTTWEB1.ora into seconds nodes's $ORACLE_HOME/dbs directory
[oracle@dbnode1]:[TTWEB1] $ scp initTTWEB1.ora oracle@dbnode2:/u01/app/oracle/product/11.2.0.4/db_1/dbs
in node2 ,go to dbnode2:/u01/app/oracle/product/11.2.0.4/db_1/dbs and rename the init file.
$mv initTTWEB1.ora initTTWEB2.ora

13. Register the database instances with CRS framework using srvctl command:
[oracle@dbnode1]:[TTWEB1] $ cat /etc/oratab | grep TTWEB
TTWEB:/u01/app/oracle/product/11.2.0.4/db_1:N
TTWEB1:/u01/app/oracle/product/11.2.0.4/db_1:N
[oracle@dbnode1]:[TTWEB1] $ srvctl add database -d TTWEB -o /u01/app/oracle/product/11.2.0.4/db_1
[oracle@dbnode1]:[TTWEB1] $ srvctl add instance -d TTWEB -i TTWEB1 -n dbnode1
[oracle@dbnode1]:[TTWEB1] $ srvctl add instance -d TTWEB -i TTWEB2 -n dbnode2
Now the instances are registered with CRS, use SRVCTL to stop and start the database.
[oracle@dbnode1]:[TTWEB1] $ srvctl status database -d TTWEB
Instance TTWEB1 is not running on node dbnode1
Instance TTWEB2 is not running on node dbnode2
At first time, srvctl won't detect the status of the instances even when our instances are up and running . so stop it and start it again throuhg srvctl command .
[oracle@dbnode1]:[TTWEB1] $ srvctl stop database -d TTWEB
[oracle@dbnode1]:[TTWEB1] $ srvctl start database -d TTWEB
[oracle@dbnode1]:[TTWEB1] $ srvctl status database -d TTWEB
Instance TTWEB1 is running on node dbnode1
Instance TTWEB2 is running on node dbnode2
[oracle@dbnode1]:[TTWEB1] $
Now check the pmon process at the node2 , you can see new instance TTWEB2 is running there
[oracle@dbnode2]:[] $ ps -ef | grep pmon | grep TT
oracle 15312 1 0 17:55 ? 00:00:00 ora_pmon_TTWEB2
[oracle@dbnode2]:[] $
You can also see TTWEB got automatically added in /etc/oratab in node2
[oracle@dbnode2]:[] $ cat /etc/oratab | grep TT
TTWEB:/u01/app/oracle/product/11.2.0.4/db_1:N # line added by Agent
[oracle@dbnode2]:[] $
You have to add an entry for TTWEB2 if it not already added , once enviormnet are set check the status of the instances through node2.
[oracle@dbnode2]:[TTWEB2] $ srvctl status database -d TTWEB2
PRCD-1120 : The resource for database TTWEB2 could not be found.
PRCR-1001 : Resource ora.ttweb2.db does not exist
[oracle@dbnode2]:[TTWEB2] $
[oracle@dbnode2]:[TTWEB2] $ srvctl status database -d TTWEB
Instance TTWEB1 is running on node dbnode1
Instance TTWEB2 is running on node dbnode2
[oracle@dbnode2]:[TTWEB2] $

14. Finally crete one password file for each instance
[oracle@dbnode1]:[TTWEB1] $ orapwd file= $ORACLE_HOME/dbs/orapwTTWEB1 password=xxxxxxx
[oracle@dbnode2]:[TTWEB2] $ orapwd file= $ORACLE_HOME/dbs/orapwTTWEB2 password=xxxxxxx

Check the logs:-
It is strongly advised to check all the logs related to Cluster, Database and instances when you perform installation, system changes or patch updates…etc. Make sure to check the log files to see the unexpected issues, if any.
CRS_HOME/log/hostname/crsd/ – The log files for the CRS daemon
CRS_HOME/log/hostname/cssd/ – The log files for the CSS daemon
CRS_HOME/log/hostname/evmd/ – The log files for the EVM daemon
CRS_HOME/log/hostname/client/ – The log files for the Oracle Cluster Registry (OCR)
CRS_HOME/log/hostname/racg/ – The log files for the Oracle RAC high availability component
CRS_HOME/log/hostname/racg – The log files for the Oracle RAC high availability component
CRS_HOME/log/hostanme/alert.log – The alert.log for Clusterware issues.
Please note that the CRS_HOME is the directory in which the Oracle Clusterware software was installed and hostname is the name of the node.

---------------------------------------------------------------------------------------------------------------------------------------------------------

Thursday, March 13, 2014

Concept of mirroring Diskgroup in ASM

Mirroring or Redundancy protects data integrity by storing copies of data on multiple disks. To achieve mirroring ,Oracle introduce concept of Diskgroup in Automatic Storage Management.
Oracle mirrors at the extent level, so you have a primary extent and a mirrored extent.When a disk fails, ASM rebuilds the failed disk using mirrored extents from the other disks within the group, this may have a slight impact on performance as the rebuild takes place.
A diskgorup is a group of fail groups together, a fail group is a group of disks together. So, disk is the minimum identity in this architecture.The disk group type determines the mirroring levels with which Oracle creates files in a disk group.In general, ASM supports three types of redundancy (mirroring*) options.When you create a disk group, you specify an Automatic Storage Management (ASM) disk group type based on one of the following three redundancy levels.The default mirroring levels indicate the mirroring level with which each file is created unless a different mirroring level is designated.

Mirroring Options for ASM Disk Group Types

Disk Group Type	Supported Mirroring Levels	Default Mirroring Level
External redundancy	Unprotected (none)	Unprotected
Normal redundancy	Two-way Three-way Unprotected (None)	Two-way
High redundancy	Three-way	Three-way

External redundancy - doesn't have failure groups and thus is effectively a no-mirroring strategy, which is used when mirroring is provided by the disk subsystem such as RAID.Any write error cause a forced dismount of the disk group.

Normal redundancy - provides two-way mirroring of all extents in a disk group, which result in two failure groups .A loss of one ASM disk is tolerated.

High redundancy - provides three-way mirroring of all extents in a disk group, which result in three failure groups.A loss of two ASM disks in different failure groups is tolerated.

The redundancy level controls how many disk failures are tolerated without dismounting the diskgroup or losing data.There are always failure groups even if they are not explicitly created. If you do not specify a failure group for a disk, then Oracle automatically creates a new failure group containing just that disk. A normal redundancy disk group must contain at least two failure groups. A high redundancy disk group must contain at least three failure groups. However, Oracle recommends using several failure groups.

Below is an example of disk group,
Let's assume, we have two disk controllers at Hardware level .Controller 1 (have four disk connected from diska), Controller 2 (have four disk connected from diskb). If, controller will fail, all disks under it will be unavailable. So, we need to plan our Disk group to take case this as well.

Controller 1:
/devices/diska1
/devices/diska2
/devices/diska3
/devices/diska4

Controller 2:
/devices/diskb1
/devices/diskb2
/devices/diskb3
/devices/diskb4

creates a disk group named dgroup1 with normal redundancy consisting of two failure groups controller1 or controller2 with four disks in each failure group.

$SQLPLUS /NOLOG
SQL> CONNECT / AS SYSASM
Connected to an idle instance.
SQL> STARTUP NOMOUNT

SQL>CREATE DISKGROUP dgroup1 NORMAL REDUNDANCY
FAILGROUP controller1 DISK
'/devices/diska1' NAME diska1,
'/devices/diska2' NAME diska2,
'/devices/diska3' NAME diska3,
'/devices/diska4' NAME diska4

FAILGROUP controller2 DISK
'/devices/diskb1' NAME diskb1,
'/devices/diskb2' NAME diskb2,
'/devices/diskb3' NAME diskb3,
'/devices/diskb4' NAME diskb4

ATTRIBUTE 'au_size'='4M',
'compatible.asm' = '11.1',
'compatible.rdbms' = '11.1';

In the above example, We made a disk group dgroup1 with normal redundancy (2-way mirroring), So, we need two fail groups controller1 and controller2.

When Automatic Storage Management allocates an extent for a normal redundancy file, ASM allocates a primary copy and a secondary copy. Automatic Storage Management chooses the disk on which to store the secondary copy in a different failure group other than the primary copy. Failure groups are used to place mirrored copies of data so that each copy is on a disk in a different failure group. The simultaneous failure of all disks in a failure group does not result in data loss.

Failure Scenarios:
1. Single Disk failure: Suppose a disk failed in controller2 say(controller2.diskb4) data will remain intact, This disk either will have primary copy or secondary copy of extents. Since, we have two copies, so any how we will have another copy of failed disk into another disk group. So, I would say in this scenario, your data is 100% safe.

2. Multiple Disk Failure: This situation can further divide into two parts Multiple disk failure from Same disk group: In this case, you will not loos data, because another disk group will have all data. Multiple disk failure from different disk group: Here is the risk, Suppose you loose controller1.diska1 and controller2.diskb4 and unfortunately, extents in diska1 have their mirror copy into diskb4 and both disks are unavailable. So, you will end up loosing data.

3. Disk Controller Failure: Suppose you loose one of your disk controller Controller 1 that means all disks in a fail group. Even, in this case data will not lost because, each extant in fail groupfailgroup1 will have a copy in failgroup2.

You define the failure groups for a disk group when you create an ASM disk group. After a disk group is created, you cannot alter the redundancy level of the disk group. To change the redundancy level of a disk group, create another disk group with the appropriate redundancy and then move the files to the new disk group.

Note:- Disk groups with external redundancy do not use failure groups.

Tuesday, November 12, 2013

How to change the redo log size in RAC

There is no any ALTER... command to resize the REDO logs in orace. So if you want to resize your REDO logs you will need to create a new group with a new size and then drop the old one.

Let's say you have this situation in your RAC for two nodes: ( say rac1 and rac2) .From rac1 node do the following ,

SQL> select GROUP#,MEMBERS,BYTES/1024/1024 MB_Size,status from v$log;
GROUP# THREAD# MEMBERS MB_SIZE STATUS
---------- ---------- ---------- ---------- ----------------
1 1 2 50 CURRENT
2 1 2 50 INACTIVE
3 2 2 50 INACTIVE
4 2 2 50 CURRENT

and you want to resize all your groups. Lets say you want to set 100M instead of 50M.

Action plan:
1. Add new REDO groups with a new size.

SQL> ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 5 ( '+DATA','+FLASH') SIZE 100M;
SQL> ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 6 ( '+DATA','+FLASH') SIZE 100M;
SQL> ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 7 ( '+DATA','+FLASH') SIZE 100M;
SQL> ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 8 ( '+DATA','+FLASH') SIZE 100M;

Mentioned commands will create 4 new groups with two members in each. Change the volume groups (+DATA and +FLASH) according to your environment .

2. Now you should wait till the Group 1/2/3/4 will start to be INACTIVE so you would be able to drop them. Also you can speed up this process by executing:

SQL> alter system switch logfile;
SQL> alter system checkpoint;

3. To DROP the old groups.

SQL> ALTER DATABASE DROP LOGFILE GROUP 1;

In busy DB it is possible to see something like this during drop operation:
ORA-01623: log 1 is current log for instance RPTDB (thread 1) - cannot drop

in that case you should execute the following again:
SQL> alter system switch logfile;
or
SQL> alter system checkpoint;
or
just continue your tries to drop it.

Note: - I tried to drop a redo log group but it is always CURRENT or ACTIVE. I done alter system switch logfile and alter system checkpoint repeatedly but it does not become inactive . I overcome this issue by executing alter system switch logfile on the other node (rac2).Once all your old GROUPs will be dropped your output would be look like this:

SQL> select GROUP#,MEMBERS,BYTES/1024/1024 MB_Size from v$log;

GROUP# THREAD# MEMBERS MB_SIZE
---------- ---------- ---------- ----------
5 1 2 100
6 1 2 100
7 2 2 100
8 2 2 100

Sourece:- http://eugene-dba.blogspot.in/2012/10/change-redo-log-size-in-rac.html

Tuesday, October 29, 2013

How to properly shutdown oracle rac database and its clusterware components in 11gR2

The first step to shutdown any database whether it is RAC or single instance is to stop all the applications that are accessing the Database. DBA should also stop Oracle application like Enterprise Manager Grid Control and Database Listener which access database for monitoring and database connections.

1.Stop Enterprise Manager/Grid Control
[oracle@rac1 ~]$ emctl status dbconsole
Environment variable ORACLE_UNQNAME not defined. Please set ORACLE_UNQNAME to database unique name.
[oracle@rac1 ~]$
[oracle@rac1 ~]$ export ORACLE_UNQNAME=myrac
[oracle@rac1 ~]$ emctl status dbconsole
Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
https://rac1.mydomain:1158/em/console/aboutApplication
Oracle Enterprise Manager 11g is running.
------------------------------------------------------------------
Logs are generated in directory /app/dbHome/rac1_myrac/sysman/log
[oracle@rac1 ~]$ emctl stop dbconsole
Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
https://rac1.mydomain:1158/em/console/aboutApplication
Stopping Oracle Enterprise Manager 11g Database Control ... ... Stopped.
[oracle@rac1 ~]$ emctl status dbconsole
Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
https://rac1.mydomain:1158/em/console/aboutApplication
Oracle Enterprise Manager 11g is not running.
[oracle@rac1 ~]$

2.Stop Node listener
[root@rac1 ~]# srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): rac1,rac2
For checking the status on specific nodes use the following command,
[root@rac1 ~]# srvctl status listener -n rac1
Listener LISTENER is enabled on node(s): rac1
Listener LISTENER is running on node(s): rac1
[root@rac1 ~]# ps -ef | grep tns
root 13 2 0 Oct28 ? 00:00:00 [netns]
grid 14547 1 0 09:32 ? 00:00:00 /app/gridHome/bin/tnslsnr LISTENER -inherit
oracle 14640 14218 0 09:34 pts/2 00:00:00 grep tns
grid 32208 1 0 Oct28 ? 00:00:01 /app/gridHome/bin/tnslsnr LISTENER_SCAN1 -inherit
[root@rac1 ~]# srvctl stop listener --> this command will stop the node listener on entire
node, if you want to do it on specific node use following command,
srvctl stop listener -n node1
[root@rac1 ~]# srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is not running
[root@rac1 ~]# ps -ef | grep tns
root 13 2 0 Oct28 ? 00:00:00 [netns]
oracle 14704 14218 0 09:34 pts/2 00:00:00 grep tns
grid 32208 1 0 Oct28 ? 00:00:01 /app/gridHome/bin/tnslsnr LISTENER_SCAN1 -inherit
[root@rac1 ~]#
Note:- Now you can see only SCAN listener is running on this node

3.Stopping database on all node
[root@rac1 ~]# srvctl status database -d MYRAC
Instance myrac1 is running on node rac1
Instance myrac2 is running on node rac2
[root@rac1 ~]# srvctl stop database -d MYRAC
The above command will shutdown all the instance within the cluster . To shutdown specific instance use the following syntax
srvctl stop database -d db_name –i instance_name
[root@rac1 ~]# srvctl status database -d MYRAC
Instance myrac1 is not running on node rac1
Instance myrac2 is not running on node rac2
[root@rac1 ~]# ps -ef | grep pmon
oracle 15161 14218 0 09:53 pts/2 00:00:00 grep pmon
grid 31716 1 0 Oct28 ? 00:00:07 asm_pmon_+ASM1
[root@rac1 ~]#

4.Stopping ASM instances and clusterware components
Now we have to shutdown asm instances and its clusterware components . As of 11gR2, we don't have to shutdown asm instances and its node apps separately . Upto 11gR1 we have to shutdown these process separately using following commands ,
Use below command to shutdown ASM instances on all nodes
srvctl stop asm -n node
Use below command to shutdown node apps on all RAC nodes
srvctl stop nodeapps -n node
Use below command to shutdown Oracle clusterware on all RAC nodes
crsctl stop crs
Please note that using above command will stop Oracle High availability services (OHAS) and Clustware stack in a single command

From 11g R2, you can do this in two steps

First stop the clusterware stack on all nodes in the cluster

[root@rac1 ~]# crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN3.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'rac2'
CRS-2673: Attempting to stop 'ora.OCR.dg' on 'rac2'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac2'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'rac2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.OCR.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.rac1.vip' on 'rac1'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rac1'
CRS-2677: Stop of 'ora.scan1.vip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.rac2.vip' on 'rac2'
CRS-2677: Stop of 'ora.rac2.vip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN3.lsnr' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.scan3.vip' on 'rac2'
CRS-2677: Stop of 'ora.scan3.vip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.OCR.dg' on 'rac1' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'rac1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.eons' on 'rac1'
CRS-2673: Attempting to stop 'ora.ons' on 'rac1'
CRS-2677: Stop of 'ora.ons' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac1'
CRS-2677: Stop of 'ora.net1.network' on 'rac1' succeeded
CRS-2677: Stop of 'ora.eons' on 'rac1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac1' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.OCR.dg' on 'rac2' succeeded
CRS-2677: Stop of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'rac2' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac2'
CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.eons' on 'rac2'
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac2'
CRS-2677: Stop of 'ora.net1.network' on 'rac2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rac1'
CRS-2677: Stop of 'ora.diskmon' on 'rac1' succeeded
CRS-2675: Stop of 'ora.eons' on 'rac2' failed
CRS-2679: Attempting to clean 'ora.eons' on 'rac2'
CRS-2681: Clean of 'ora.eons' on 'rac2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac2' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac2'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac2'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'
CRS-2673: Attempting to stop 'ora.asm' on 'rac2'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rac2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'
CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rac2'
CRS-2677: Stop of 'ora.diskmon' on 'rac2' succeeded
[root@rac1 ~]#
Note:- If you want to stop Clustwerware stack on local node use the following command

#crsctl stop cluster

[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@rac1 ~]#
Here you can see only Oracle High Availability Services is in online , Stop Oracle High availability service demon on each node in the cluster.
[root@rac1 ~]# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@rac1 ~]#
You have to execute the same command 'crsctl stop has' on all nodes in the cluster , Here i executed the same command on my node2
[root@rac2 ~]# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'
CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@rac2 ~]#

[root@rac1 ~]# crsctl check cluster -all

CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Check failed, or completed with errors.
[root@rac1 ~]#

Finally we cleanly shut-downed our two node oracle rac . Hope these steps helps somebody :)

Nothing but my learning and experience with Oracle..!

Pages