OraPL: November 2011

Tuesday, 22 November 2011

Instance crash recovery, Why,When,How in Oracle 10g

Whenever, Accidently power failure causes server shutdown, Shutdown abort and any other reason that cause database to go down in inconsistent way requires instance recovery. Before we go deep into this we need to understand working of SMON and SCN.

What is this SMON:- SMON is one of the MUST HAVE back ground process. And this is the process that recovers database from instance crash. SMON checks the SCN in all the data files headers and compare this with SCN stored in Control file. If these two matches means database is in consistent state and can be opened.

What is System Change Number (SCN):- This is the number (always increasing) incremented whenever a Commit has happened or AQ, SMON,PMON has completed their work. To get current SCN we need to use DBMS_FLASHBACK.Get_System_Change_Number. SCN is also used in Flashback Query

Eg Select * from scott.emp as of scn 1232097

SCN is checked whenever Oracle starts up the database. SMON compares the SCN of Checkpoint_Change#(v$database.Chechpoint_Change# with v$datafile.checkpoint_change# and v$datafile_header.checkpoint_change#) and if all the SCN are same then database is in consistent state and can be opened.

SCN plays a vital role for providing consistent reads. Basically, it works as follows: The query reads a db block. Beside other information block also contains the information regarding when last time block was changed.. If this SCN is greater than the SCN that was in place when (our) query began, it means that the block was changed after we have started our query. So we have to find an older version of the block. If this block is found in the rollback segments, we use it for our query.

The v$ views use change# to report SCN values. Hence, it can be argued that SCN means System Change Number, not System Commit Number.

When exporting, an SCN can be specified with the exp_flashback_scn export parameter.

More on SCN

There are 3 SCNs basically in control file

1. Checkpoint SCN

2. Stop SCN

3. Thread checkpoint SCN

Checkpoint SCN is the datafile checkpoint SCN when checkpoint happens for datafile. This checkpoint SCN is recorded in datafile header as well.

Stop SCN is the SCN which gets recoreded in control file when datafile is taken in begin backup mode or when datafile is taken offline. This is the checkpoint at a point when datafile header is freezed.

Thread Checkpoint SCN is the one related to online redo log files. This SCN gets generated when ever transaction get recoreded in online redo log file. When we shut down database with normal or immediate option, all these SCN are synchronized and made equal.

/

Eg.

SQL> select banner from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

PL/SQL Release 10.2.0.1.0 - Production

CORE 10.2.0.1.0 Production

TNS for 32-bit Windows: Version 10.2.0.1.0 - Production

NLSRTL Version 10.2.0.1.0 – Production

Now! Check the Database checkpoint change number using V$DATABASE. With this number (CHECKPOINT_CHAGNE#) , all the number (CHECKPOINT_CHANGE#) of datafile header should match.

SQL> select name,checkpoint_change# from v$database;

NAME CHECKPOINT_CHANGE#

--------- ------------------

SEED 1151635

SQL> select name,checkpoint_change# from v$datafile;

Name Checkpoint_Change# Last_Change#

------------------- -------------------------------- --------------------------

D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSTEM01.DBF 1151635 1151635

D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\UNDOTBS01.DBF 1151635 1151635

D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSAUX01.DBF 1151635 1151635

D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\USERS01.DBF 1151635 1151635

D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\EXAMPLE01.DBF 1151635 1151635

D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\RMAN_TBS01.DBF 1151635 1151635

Now, The CHECKPOINT_CHANGE# of Database and Datafile Header are same and more importantly Start SCN (Checkpoint_change#) and Stop SCN (Last_Change#) are same means START and STOP SCN are same and this confirms that our database is consistent state and does not requires any kind of recovery.

So when you use shutdown immediate or Shutdown normal, Database goes down with consistent state. However, this is not the case with “Shutdown Abort” statement.

To understand fully, We need to understand actual working of “Shutdown” process, SCN ,Checkpoint_Change#, Last_Change#.

Shutdown Process:

Shutdown Immediate/Shutdown Normal: This is the most clean method to take database down. Using this statement, DB goes down in consistent state. Following are the stages:

Shutdown Immediate:

SQL> select open_mode from v$database;

OPEN_MODE

----------

READ WRITE

SQL> shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

Internally, First Oracle closes all the Background (SMON, PMON, CHKPT, DBWR, CJQ0,MMNL,MMON) processes . Secondly if Archival is enabled then it stops. Thirdly, All uncommitted transaction are rolled back, further logons stops. Checkpoint happens, All the blocks ( Changed, Committed) are flushed to Hard disk. If any trigger (System Trigger) is written on BEFORE SHUTDOWN system event than it fires.

Instances withers away, All Locks, Enaques (TX) are released. “ALTER DATABASE CLOSE NORMAL” and “ALTER DATABASE DISMOUNT” statements are executed. After all these happenings our database is now SHUT.

In this scenario, Before closing down Oracle does CHECKPOINT that makes database state consistent. Means no Online REDO logs will be required to open database. Once you open the database Checkpoint_Change# of Database (Controlfile),Datafile(s) and Datafile_Headers will be same.

Checkpoint change number of controlfile.

SQL> Select Checkpoint_Change#,name from v$database;

CHECKPOINT_CHANGE# NAME

---------------------------- -------------

1202269 SEED

Checkpoint change number of Datafile(s).

SQL>Select Checkpoint_Change#,name,last_change# from v$datafile;

CHECKPOINT_CHAGNE# NAME Last_Change#

--------------------- --------------------------------- -------------------------------

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSTEM01.DBF 1151635

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\UNDOTBS01.DBF 1151635

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSAUX01.DBF 1151635

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\USERS01.DBF 1151635

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\EXAMPLE01.DBF 1151635

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\RMAN_TBS01.DBF1151635

Checkpoint Change number in Header of Datafile(s).

SQL>Select dh.checkpoint_change#,df.name from v$datafile_header dh,v$datafile df

where df.file#=dh.file#

CHECKPOINT_CHAGNE# NAME CHECKPOINT_CHAGNE#

--------------------- --------------------------------- ----------------------------------------

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSTEM01.DBF

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\UNDOTBS01.DBF

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSAUX01.DBF

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\USERS01.DBF

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\EXAMPLE01.DBF

1202269 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\RMAN_TBS01.DBF

In this process, Every data file has the same CHECKPOINT_CHANGE# number as DATABASE has. So database does not need any kind of recovery (INSTANCE) recovery.

Now! See how SHUTDOWN ABORT just like a crash and why it needs recovery.

SQL> shutdown abort;

ORACLE instance shut down.

In this scenario, Database has not done anything to make things consistent instead it is just crashed.

However, unlike Media recovery where we need not to apply Archived REDO logs all the changes stored in Online REDO Log files will be re-applied and instance will recover itself.

In Other words START SCN != STOP SCN so recovery is required.

SQL> startup mount;

ORACLE instance started.

Total System Global Area 612368384 bytes

Fixed Size 1250428 bytes

Variable Size 230689668 bytes

Database Buffers 373293056 bytes

Redo Buffers 7135232 bytes

Database mounted.

SQL> Select name,Checkpoint_Change# from v$database;

NAME CHECKPOINT_CHANGE#

--------- ------------------

SEED 1260508

SQL>Select Checkpoint_Change#,name,last_change# from v$datafile;

CHECKPOINT_CHAGNE# NAME Last_Change#

--------------------- --------------------------------- -------------------------------

1260508 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSTEM01.DBF Null

1260508 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\UNDOTBS01.DBF Null

1260508 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSAUX01.DBF Null

1260508 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\USERS01.DBF Null

1260508 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\EXAMPLE01.DBF Null

1260508 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\RMAN_TBS01.DBF Null

Next time DBA starts up the database, it with out asking any recovery will automatically apply contents of Online REDO log files (ROLLFORD) and will open the database.

SQL> startup

ORACLE instance started.

Total System Global Area 612368384 bytes

Fixed Size 1250428 bytes

Variable Size 234883972 bytes

Database Buffers 369098752 bytes

Redo Buffers 7135232 bytes

Database mounted.

Database opened.

I have tried to put the instance recovery process in clear and precise words, but still if anything is missing then any correction, suggestion are most welcome.

Tuesday, 15 November 2011

System datafile recovery

System datafile lost:-- In this scenario,Hardware failure (Media failure) causes loss of System01.dbf file. And when DBA tries to start the database gets following error:

SQL> startup

ORACLE instance started.

Total System Global Area 612368384 bytes

Fixed Size 1250428 bytes

Variable Size 176163716 bytes

Database Buffers 427819008 bytes

Redo Buffers 7135232 bytes

Database mounted.

ORA-01157: cannot identify/lock data file 1 - see DBWR trace file

ORA-01110: data file 1: 'D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSTEM01.DBF'

Now! To see any message/note regarding this we can check DBWR trace file located at

D:\oracle\product\10.2.0\admin\seed\bdump location. The naming convention of trace file is like

SID_DBWR_<srno>.trc. This trace file contains following message:
Dump file d:\oracle\product\10.2.0\admin\seed\bdump\seed_dbw0_1100.trc

Tue Nov 15 15:43:10 2011

ORACLE V10.2.0.1.0 - Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows Server 2003 Version V5.2 Service Pack 2

CPU : 2 - type 586, 1 Physical Cores

Process Affinity : 0x00000000

Memory (Avail/Total): Ph:552M/2037M, Ph+PgF:1134M/3935M, VA:1302M/2047M

Instance name: seed

Redo thread mounted by this instance: 1

Oracle process number: 5

Windows thread id: 1100, image: ORACLE.EXE (DBW0)

*** SERVICE NAME:() 2011-11-15 15:43:10.613

*** SESSION ID:(167.1) 2011-11-15 15:43:10.613

ORA-01157: cannot identify/lock data file 1 - see DBWR trace file

ORA-01110: data file 1: 'D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSTEM01.DBF'

ORA-27041: unable to open file

OSD-04002: unable to open file

O/S-Error: (OS 2) The system cannot find the file specified.

Recovery step in this scenario are:

1:- Take DBF offline;

2:- Restore DBF from Backup;

3:- Recover datafile;

4:- Take DBF online.

Before we proceed with Recovery, We check the existence of backup of Datafile 1(System01.dbf).

C:\ rman target / catalog rman_user/rmRecovery Manager: Release 10.2.0.1.0 - Production on Tue Nov 15 16:56:22 2011

Copyright (c) 1982, 2005, Oracle. All rights reserved.

connected to target database: SEED (DBID=848131167, not open)

connected to recovery catalog databasean_user@seed_82

RMAN> list backup of datafile 1;

List of Backup Sets

===================

BS Key Type LV Size Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ ---------------

1248 Full 134.63M DISK 00:00:41 15-NOV-11

BP Key: 1249 Status: AVAILABLE Compressed: YES Tag: FULL_DB_BKUP

Piece Name: D:\ORACLE\RMAN_BKUP\SEED\SEED_%R_01MRNTAO.BKP

List of Datafiles in backup set 1248

File LV Type Ckp SCN Ckp Time Name

---- -- ---- ---------- --------- ----

1 Full 1150785 15-NOV-11 D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSTEM01.DBF

Now! We can see that this particular backupset can be used for Recovery of System data file.

Step1:- Take System01.dbf (1) offline

RMAN> Sql "Alter Database Datafile 1 Offline";

sql statement: Alter Database Datafile 1 Offline

Step 2:- Restore Datafile 1 from backup

RMAN> Restore datafile 1;

Starting restore at 15-NOV-11

allocated channel: ORA_DISK_1

channel ORA_DISK_1: sid=155 devtype=DISK

channel ORA_DISK_1: starting datafile backupset restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

restoring datafile 00001 to D:\ORACLE\PRODUCT\10.2.0\ORADATA\SEED\SYSTEM01.DBF

channel ORA_DISK_1: reading from backup piece D:\ORACLE\RMAN_BKUP\SEED\SEED_%R_01MRNTAO.BKP

channel ORA_DISK_1: restored backup piece 1

piece handle=D:\ORACLE\RMAN_BKUP\SEED\SEED_%R_01MRNTAO.BKP tag=FULL_DB_BKUP

channel ORA_DISK_1: restore complete, elapsed time: 00:00:56

Finished restore at 15-NOV-11

Step 3:- Recover the datafile

RMAN> Recover datafile 1;

Starting recover at 15-NOV-11

using channel ORA_DISK_1

starting media recovery

media recovery complete, elapsed time: 00:00:02

Finished recover at 15-NOV-11

Step 4:- Take datafile online

RMAN> Sql "Alter database datafile 1 online";

sql statement: Alter database datafile 1 online

Note:- The timing of this recovery varies and depends upon the DB size,Number of Archive REDO log to apply.

Any comment/Suggestion will be welcome.

Subscribe to: Comments (Atom)