Recovery Information and Scenarios

Recovery Guidelines
Performing Recovery (Different Recover commands)
Archive Log Information
Control File Information
Online Redo Log Information
System Tablespace Information
Temp Tablespace Information
Application Tablespace Information
    Loss of Index Datafile
    Loss of Data Datafile
Read-Only Tablespace
Archive Log Files

Recovery Scenarios
Media Failure, DB not in ARCH Log Mode, Restore ALL Files
Recover a Particular Datafile when DB is closed (Closed DB Recovery)
Recover a Particular Tablespace when DB is Open (Open DB Recovery with Offline Tablespace)
Recover a Particular Datafile when DB is Open (Open DB Recovery with Offline Datafile)
Recover a Lost Datafile without Valid Backup
Recover Missing Archive Logs or Redo Log Files (Cancel-Based Recovery)
Timed Base Recovery (Example, recover from drop table)
Changed Base Recovery
Recover DataFile in Backup Mode
Loss of Control File
Loss of ALL Control Files
Loss of "Inactive" Online Redo Log Group
Loss of "Active" or "Current" Online Redo Log Group
Loss of UNDO Tablespace


Recovery Guidelines

First of all, you need to undestand the concepts of Recovery:
 
Instance Failure
Instance failure is a hardware, software, or system failure that prevents an instance from continuing work. It can be caused by a CPU failure, an operating system failure, a power outage, failure of one of the ORACLE background processes or a failure to access a required database file when the file is not lost or damaged.

Instance Recovery
Instance recovery is automatic. Restarting the database performs the instance recovery. It involves two steps:
1) Rolling forward.
          o data that has not been recorded in the database
          o the contents of rollback segments

2) Rolling back transactions that have been explicitly rolled
          o Rollback transactions that has not been committed.

Any resources held by transactions in process at the time of the failure will be released. Instance recovery is not necessary if the database is shutdown normally.

Media failure
Media failure is a hardware, software, or system failure that prevents reading or writing to files that are required to operate the database. Media failure is a failure caused by the loss of the control file, database files or redo log file.
To recover up to the last point of failure, the database must be operating in ARCHIVELOG mode. In addition, you must have the latest backup of database, all online redo log files, archived logs, current control file.
If the DB is operating in ARCHIVELOG mode, there are options for recovery depending on the type of failure.  Broadly there are two types of media recovery:
•   Complete media recovery
•   Incomplete media recovery

Complete Media Recovery:
A complete media recovery will recover all lost changes to the database. You need to recover the entire database or a single data file which becomes corrupted. The dynamic view V$RECOVER_FILE determines what files requires recovery. The complete media recovery can be accomplished only if all online and archived redo log files are available. Complete media recovery can be performed on offline data files while the database is open and need to be opened with the RESETLOGS option if a backup control file or new control file was created for the recovery.
Types of complete media recovery are:
1.  Closed Database Recovery   
2.  Open Database Recovery with Offline Tablespace
3.  Open Database Recovery with Offline Datafile   

Incomplete Media Recovery:
In some situations complete media recovery may not be possible or may not be desired. This can happen because all the files needed for a complete recovery are not available (for example, all online redo log files are lost and you were not duplexing your log groups).  You might want to perform an incomplete recovery if a user drops a table and you want to recover to the point before the table was dropped.
Incomplete media recovery is also called point-in-time recovery.  Point in time recovery that is not continued to a complete recovery must be terminated by the OPEN RESETLOGS option.  The database must be closed during incomplete recovery operations.
There are three types of incomplete media recovery:
1.  Cancel-Based Recovery  
2.  Time-Based Recovery (Example, recover from drop table)
3.  Change-Based Recovery  



Performing Recovery (Different Recovery Commands)
You can use different commands to recover your database, they are:

1) RECOVER DATABASE
This command is only used with a current control file. Database must be mounted, but not OPEN. The control file is compared with the datafile header and brings the datafiles up to date by applying archived redo logs to make the control file entries match the datafile header. Online redo logs are applied to make the datafiles current. Once the recovery is complete open the database with: ALTER DATABASE OPEN;

2) RECOVER DATAFILE ‘filename’
This command is used when database is up and can't be brought down. It can also be used when the database is in mounted state but the tablespace which contains these datafiles must be taken offline.
Issue RECOVER DATAFILE ‘filename’. You will be prompted for log files. The changes will be applied only to these files.
Once the media recovery is complete the tablespace can be brought online. It allows you to perform 'multi-tasking' recovery, different datafiles can be recovered parallelly using different sessions or terminals. Very useful when there are several datafiles to be recovered.

3) RECOVER TABLESPACE ‘tsname’
Tablespace must be offline. Database must be in OPEN state. Recovers a single tablespace to the current state. This command can't be used on SYSTEM tablespace or a tablespace which has rollback segments having a status "in use". If having database OPEN is not an issue, can recover using standard recovery (RECOVER DATABASE)

4) RECOVER DATABASE UNTIL CANCEL
Manual recovery after media failure enables you to control how many redo log files to apply to the database. This can be used to undo an inadvertent change to the database by stopping recovery before the change took place. MANUAL option needed for recovery with a control file backup (old copy) and current control file is not available.
Database must be MOUNTED but not OPEN. Once you issue RECOVER DATABASE UNTIL CANCEL command, you will be prompted beginning with the earliest redo log file recorded in the header of each database file. The recovery process will continue to prompt for redo log files until CANCEL is typed when prompted for the next redo log file. Recovery can be cancelled at any time of any redo log.

5) RECOVER DATABASE UNTIL TIME <date&time>
It's the same as RECOVER DATABASE UNTIL CANCEL except the granularity is recovery is stopped at a specified in time within a log file.
The format of <date&time> is YYYY-MM-DD:hh:mm:ss

6) RECOVER DATABASE  ..... USING BACKUP CONTROLFILE
Can be used for recovery with an old copy (backup) of control file. Everything else is similar to other RECOVER DATABASE commands.

Options Opening the Database
As security measure before starting the recovery backup datafiles, online logs, and control file. If space is a constraint then at least backup the online logs and control files.
Open the database with: ALTER DATABASE OPEN [NO]RESETLOGS

RESETLOGS
Before using RESETLOGS option take a cold backup of the database. Once RESETLOGS is used then the redo log files can't be used at all.
The RESETLOGS clears all the online redo logs and modifies all the online data files to indicate no recovery is needed. After resetting the redo logs none of the existing log files or data file backups can be used. In the control file log sequence number is modified, which is very important for recovery purposes. The recovery will be applied only to the log files whose sequence number is greater than log sequence number in the control file. One has to be very cautious when using RESETLOGS option. One more important factor to remember is that all datafiles must be online otherwise they will become useless once the database is up.

NORESETLOGS
The NORESTLOGS option will allow to apply the online redo logs and will not clear the redo log files during startup. Leaves online logs intact. Only used in scenario where MANUAL RECOVERY started and CANCELled, and then RECOVER DATABASE is started.
 

Provide Arch Log Information
Manually
If archive log destination is not defined in the parameter file, Oracle has to be informed about the location of the file archive logs.
Alter system archive log start to <location>;
To recover also you have to define the archive log location
recover from < location > database;

Automatically
Before recover set auto recovery on.
set autorecovery on;

Then type your recovery option, example:
recover datafile ‘<datafile path/name or file no.>’;

Enter auto when prompted for a redolog file.
Auto;
Or
recover automatic datafile ‘<datailepath/name or file no.>’

Some Useful Views
V$recover_file  -  which file need recovery
V$recovery_log  -  archive log need for recovery


 
Control File Information
Control files are key components to recovery. A control file that reflects the physical structure and current status of the database must always be available. The control file should be backed up at the completion of every structural change. For example, after adding a datafile or log file and after the completion of a hot backup to the database, the control file should be backed up.
Because a control file is an unreadable binary file, Oracle provides a method to create a readable SQL script. Such a script is generated by:
alter database backup controlfile to trace as '/path/Today_Date';
Multiple backup copies of control files from different times should be kept to enable point-in-time recovery. The trace files should have a naming standard which includes the timestamp.
At least three control files should be used. Each should be on different controllers and disks.
In the event of losing a control file and its mirrored copy due to user error or corruption, the DBA can easily restart the instance with the remaining valid control files. If all control files are lost, the DBA needs to startup nomount the instance, recreate the control file by running a valid create control file script, and open the instance.
Failure to write to any of the control files will result in the database aborting or crashing.
If you lose a single control file and have at least one good control file, we recommend that you:
          o Shutdown if the instance is not down yet.
          o Check background trace files or alert log file to determine which control file is corrupted.
          o Copy the good control file over the bad one or modify the init.ora to reflect only good control files.
          o Startup
If you lose all your control files, then you should issue the command
          o CREATE CONTROLFILE

The create control file command should already be in a script and can easily be created while the database is up by issuing an alter database backup controlfile to trace as '/path/Today_Date';. The output from the NORESETLOGS option is dramatically different from the output without RESETLOGS (NORESETLOGS is default).
The NORESETLOGS option produces a script that does not contain the thread numbers of each group. This may be serious problem if you modify this script to contain the RESETLOGS option. This will ignore the other threads.

Risks and disadvantages of the create controlfile command:
          o Lost previous log history.
          o Database specific details must be recreated by looking at the file headers.
          o Data file information is obtained by data files. If the data files are corrupted or not from a single backup, then there may be problems.
          o Certain optimizing SCN structures are ignored with a backup controlfile
          o User error possibility is high when there is not a create controlfile script already in place.
          o Check if you are in archivelog mode.

When should you backup your controlfile?
          o After the completion of a hot backup
          o After a database structural change (i.e. add data file, change sizes of files, add log files)
          o After a tablespace status change (i.e. READ-ONLY to READ-WRITE or vice versa)
          o After backing up your control file, you should also create or adjust any create controlfile scripts. Your control file is a key component in recovery. It should always represent the physical configuration and the file status of the time you open the database. In most situations, it is best to recover with the current control file.

What to check for when using the backup controlfile option during recovery?
Make sure that the contents of the controlfile reflect the physical structure of the database you are recovering to.

Alter database open resetlogs is needed when using backup controlfile. A backup is always recommended after resetlogs because the logs are cleared and the sequence numbers are reinitialized.

In the event that media recovery needs to be performed, the current control file should be used if it does reflect the physical structure of the database that you are planning to recover to. The current control file preserves crucial timestamp and stop SCN information which may optimize and ease recovery while using the "backup control file" syntax needs to ignore some timestamp information found in the control file.


Online redo logs Information
On-line redo logs are crucial for instance and media recovery. They contain the redo to roll forward the database and the undo needed to rollback any uncommitted transactions. The loss of a current on-line redo log or loss of an unarchived on-line redo log will result in an outage.
An unarchived log cannot be dropped unless NOARCHIVELOG mode is enabled; unfortunately, in order to switch to this mode, one needs to shutdown and remount the instance.

To protect from the above outages, Oracle recommends multiplexing of the redo log files over disk mirroring if both techniques cannot be used. Oracle multiplexing (multiple members per group) protects against user error (e.g. an accidental delete) while hardware mirroring does not. Ideally, each member of the on-line redo group should be on different drives and different controllers.
Smaller checkpoint intervals will enable crash recovery to complete faster although it imposes a performance overhead of writing dirty buffers and updating file headers. Oracle, however, always does a checkpoint on a log switch. A typical recommended setting is a checkpoint every 30 minutes.
Furthermore, the customer should monitor the frequency of checkpoints and archiver wait problems. Although the database is up and running, since it is hung due to checkpoint problems or archiver wait problems, the database is essentially not available. If the checkpoint is initiated and a log is not completed before the log needs to be recycled, the database will temporarily freeze all user activity until the checkpoint is finished.
Async IO should be available and should be default. With Async IO, this problem does not usually appear. If
Async IO is not available, multiple database writers (DBWR) and a checkpoint process (CKPT) can reduce checkpoint wait errors but usually is only a temporary fix. However, if this happens frequently, it may be due to the size and/or number of log files being small relative to the rate of changes being made in the database. A similar situation arises when log writer waits on archiver to complete archiving of a redo log. Again, the solution is to add more log file groups or increase the size of the log file groups to allow the archiver process more time to do its job. One can also increase log_archive_buffer_size to increase the size of a single read from the on-line redo logs by archiver (arch). Multiple archivers are a common solution to archive waits. Checkpoints should always be monitored to determine if activity predictions are accurate or changing by querying V$SYSSTAT.
Oracle recommends that log file status be monitored very closely, especially for the STALE or INVALID status. The INVALID log file errors do appear in the alert.log as an IO error and would be detected by alert.log parsing. The STALE status indicates that the state of any pending write on that log is unknown or the log is not complete (a log being written to prior to a shutdown abort). This potential problem does not appear in the alert.log and can only be monitored by V$LOGFILE. If frequent STALE or INVALID status is found on a running database, it may be indicative of hardware problems.
Log files have the highest IO activity and they should be placed on their own set of disks if possible and definitely separated from the archive files.


Standards:
What do you do when you lose or discover a stale or invalid redo log member of a group? What if you lose the entire group that has been archived? What if you lose a group that has not been archived but is not the current or next group? What do you do if you lose the current online redo log?


Online Redo Logs

Preventive Steps

Detection Mechanisms

Online redo logs are needed for recovery.

  • Oracle multiplexing of log groups
  • Never placed Online Redo logs on the same disks as archive logs and application data files
  • Make sure log switch rate is matched by DBWR activity and ARCH activity
  • Make sure log group size is acceptable to availability requirements
  • Minimum of 3 log groups.

Implement multiple archivers.

  • Check V$LOG for invalid or stale statuses
  • Parse alert.log and background trace file for redo log errors
  • Always check V$LOG and V$LOGFILE to determine which log is current and which log has been archived.
  • Monitor log switch rate and checkpoint rates.
  • Monitor if the archiver keeps up with the log writer.

Types of Outage

Steps

Early Detection

Error Detection

Loss of one and not all of the online redo log members of a particular group

- Drop log file member

- Add log file member

No detection is possible until log switch is performed either exiting or entering the file.

The alert log records the following error stack when switching out of the redo log group. ORA-00316, ORA-00312, ORA-00321, ORA-00312. In addition the LGWR process will generate the same error stack Error number 312 reports the file effected. The full syntax is;

ORA-00312: online log 1 thread 1: ‘file name’

The view, V$LOG, should be monitored for statuses of ‘STALE’ or ‘INVALID’.

Loss of inactive archived redo log group

Notes:

1. All recovery time are quite static except for the crash recovery.

2. The addition of new redo log group can be done afterwards.

- Shutdown abort

- Startup mount

- Drop the problem redo log

- Alter database open (crash recovery)

No detection is possible until the LGWR process write to the redo log

The LGWR process will generate the error stack OPIRIP, the general one ORA-00447 followed by more specific ones. In this test case, they are ORA-00313, ORA-00312 and ORA-07366. The failure of the LGWR will cause other background processes fail with some general error message produced and finally alert log will report a background process failure. (may not be the LGWR process)

Loss of an inactive redo log group that has not been archived

Notes:
1. All recovery time are quite static except for the crash recovery.

2. The addition of new redo log group can be done afterwards.

3. Cannot drop the unarchived redo log without setting noarchivelog mode.(ORA-00350)

4. Cannot set archivelog after dropping the problem redo log group since instance recovery required.(ORA-00265)

- Shutdown abort

- Startup mount

- Alter database noarchivelog

- Drop the problem redo log

- Alter database open (crash recovery)

- Shutdown normal

- Startup mount

- Alter database archivelog

- Alter database open

No detection until the ARCH process archive the redo log The ARCH process will generate archival stoppage message - ARCH: Archival stopped, error occurred. Will continue retrying - followed by informative error messages (ORA-00255, ORA-00312) reporting the problem online redo log and more specific error message(s) telling the cause (in the test case, it is ORA-00286). The same set of error messages will also appear on the alert log file together with archival stoppage message.

Other Online Redo Log Outages

Detection

Steps

Loss of the current online redo log group.

The LGWR process will generate the error stack OPIRIP, the general one ORA-00447 followed by more specific ones. In this test case, they are ORA-00313, ORA-00312 and ORA-07366. The failure of the LGWR will cause other background processes fail with some general error message produced and finally alert log will report a background process failure. (may not be the LGWR process).

V$LOG and V$LOGFILE will indicate if this is the current log. If so, we must switch to CRF.

1. Restore and commence incomplete recovery.
2. Commence Disaster Recovery Plan

Silent Redo Log Corruption

Error (ORA 600[3020]) during application of archive log on the standby database site.

1. Rebuild standby database if there is one present.
2. Investigate Primary for any problems. Check alert.logs.

Internal Redo Corruption

ORA 600 [3020] implies that this change in the redo log is corrupted or inconsistent with the changes in the data block.

All ORA 600 errors during application or writing of redo logs may be evidence to a corruption in the online redo log.

1. If it does not affect primary, then refresh or rebuild standby database.
2. If primary is down, commence disaster recovery plan



System Tablespace Information
The system tablespace is the most crucial of all tablespaces because it contains the data dictionary tables and objects. Loss of the system data file will result in database recovery or switching to a disaster recovery site.
The system tablespace should be configured and sized properly. By convention, no user objects or temporary segments are created in SYSTEM. This should be enforced using privileges and roles with periodic monitoring to catch any deviations. Basically, the RESOURCE role and UNLIMITED TABLESPACE privilege should not be granted to an user. A typical size of 300 MB - 500 MB may be required if PL/SQL and/or stored procedures are used. Free space and fragmentation should be monitored.
Loss of a system data file:

System Tablespace

Preventive Steps

Detection Mechanisms

System tablespace contains the data dictionary tables.

  • No non-data dictionary objects should be found in SYSTEM
  • Monitor Space usage
  • Hardware mirroring with redundant controllers
  • No one should have privileges to write to the system tablespace
  • No auditing.

Spare disks in case of disk failures.

  • Monitor threshold on space usage within system
  • Monitor for non-system objects found in system tablespace
  • Parse alert.log and background trace files for ORA 600 errors with your monitoring tool.
  • Monitor disk failures to check if it affects system data files.

System Tablespace Outages

Comments / Detection Mechanisms

Steps

Lack of space or system tablespace fragmentation.

This should be monitored very closely and the monitor tool should ensure that space is sufficient in all their tablespaces. However, if this occurs, the customer needs to add a datafile.

   Add another data file

Loss of system tablespace.

Monitor tool should track disk failures and correlate with the corresponding data file.

1. restore and recover.
2. commence disaster recovery plan.

Corruption of data dictionary object.

ORA-1578 on a data dictionary object, and ORA 600 may be an indication of data dictionary corruption. These errors should be parsed from the alert.log by the monitoring tool. 

1. restore and recover.
2. commence disaster recovery plan.




Temp Tablespace Information
Although the loss of the temporary tablespaces will not result in losing access to data, the application, in particular operations that require sort space or temp space, may fail.
At least two temporary tablespaces should be available while precreated alter user scripts are established to prevent system tablespace fragmentation or application failure in case of loss of a temporary tablespace.

Multiple temporary tablespaces will be beneficial in distributing sort allocations for different users and improve the performance of concurrent sorts. Furthermore, if several index rebuilds have to occur due to corruption or maintenance, multiple temp tablespaces will allow the parallelism to restore and create the indexes quickly. For index rebuilds, TEMP space of up to twice the size of indexed data may need to be allocated. As this TEMP space may not be required during normal operation, it is useful to keep spare disk space.

Following the initial index building phase, temporary segments usage is limited to sort activity. For performance reasons, the sort_area_size setting should be high enough to keep disk sorts to a minimum assuming there is enough real memory available. All complex queries which may potentially require disk sorts will then write to the temp tablespace.

 

Temporary Tablespace

Preventive Steps

Detection Mechanisms

Temporary segments are used for sorting and temporary scratch work when the allotted space in memory is insufficient.

  • Temporary tablespaces should be hardware mirrored with redundant controllers.
  • Alter user scripts should be available to switch users to an alternative temporary tablespace.
  • Enough space should be available if more is needed for the temp.
  • Test and measure the amount temp space needed as your application changes. 

Extra disks should be available.

  • Monitor disks and controllers for failure.

Temporary Tablespace Outages or Temporary tablespace related problems.

Detection Mechanisms/ Comments

Realistic/

Tested TTR

Steps

Cannot allocate extents or tablespace fragmentation.

These errors do not appear in the alert.log; thus, it must be trapped within the application. ORA 1547, ORA 1562 during a sort operation indicates temp tablespace configuration problem.

The customer should configure the TEMP tablespace to avoid problems like this. To fix this problem once it is detected encompasses adding another data file or creating another tablespace.

1. Add another data file.

.

Losing temporary tablespace data file.

 

 

1. Run alter user scripts to switch temporary tablespace usage for users.

2. Create another temp tablespace if possible.

Types of Outage

Steps

Early Detection

Error Detection

Loss of temporary tablespace data file

- Create another temporary tablespace.

- Alter user to use to new temporary tablespace.

- Drop current temporary tablespace.

Monitor tool needs to detect disk failure and correspond OS files to Oracle files.

Monitor tool needs to detect disk failures and correspond OS files to Oracle files.


 

Application Tablespace Information
All application data files should be hardware mirrored on different controllers and disks for additional resilience. More importantly, the application should have integrity checks and error checking mechanisms to prevent errors and to automatically recover in case of errors. Security policies and proper prevention techniques will be essential to avoid dropping, deleting or modifying critical data. We also recommend partitioning of data and implementing business rules and database integrity constraints to detect for application data inconsistencies.

Even with built in prevention, user errors, application bugs, hardware failures, etc still occur which will result in some type of recovery. There are several possible solutions to resolve these problems:

Case 1: Loss of index data file
Tablespace recovery and object recovery are two options that are available. Tablespace recovery would entail restoring the affected tablespace from the hot backup and rolling forward. Object recovery consists of recreating the tablespace and recreating the indices. Pre-created create index scripts (implemented in parallel or with degrees of parallelism) should be kept ready and ran concurrently touching different temporary tablespaces to expedite recovery. In addition, large temporary tablespaces should be made available to allow for successful index creation.
Although both recovery descriptions allow for full availability of the database and the tables, this is still a serious outage due to the slow performance.

 

Loss of Index Data file

Steps

Early Detection

Error Detection

Recreate Index Tablespace

- Drop Index tablespace

- Create another index tablespace

- Create all indices in the tablespace in parallel.

Monitor tool should detect disk failure.

If the failure is an index tablespace, automated steps are needed to recreate the indices.

Monitor tool should detect disk failures.

Restore from Backup and Recover

- Offline tablespace.

- Restore from hot backup.

- Set autorecovery on.

- Bring up recovery processes.

- Alter database recover automatic tablespace.

- Alter tablespace online.

Monitor tool should detect disk failure.

If the failure is an index tablespace, automated steps are needed to be restore and recover.

Monitor tool should detect disk failures.

Switch to disaster recovery plan.

Switch to standby database or replicated database. 

Monitor tool needs to detect disk failure and correspond OS files to Oracle files.

Monitor tool needs to detect disk failures and correspond OS files to Oracle files.


 

Case 2: Loss of application table datafiles

Hopefully, the customer does implement some type of object level backup and recovery strategy to reduce MTTR. If they do not, they must either restore from backup and recover or implement their disaster recovery plan.
Object level recovery only works when there are mechanisms within their application to resynchronize from a determined point of time. When the subset has been restored, then the subset should be able to resynchronized with the rest of the database via the application. The database should be partitioned in such a way that recovery can occur to subsets of the database while other functionality is not impeded.

 

Loss of Data File

Steps

Early Detection

Error Detection

Recreate Table Tablespace

Only applicable if exports or unloads of the tables are taken.

Only applicable if there is an object level recovery plan.

(snapshots, unloader, replicated database, etc)

Monitor tool should detect disk failure.

If the failure is an index tablespace, automated steps are needed to be restore and recover.

IO errors or ORA 1100s errors.

Restore from Backup and Recover

- Offline tablespace.

- Restore from hot backup

- Set autorecovery on.

- Bring up recovery processes.

- Alter database recover automatic tablespace.

- Alter tablespace online.

Monitor tool should detect disk failure.

If the failure is an index tablespace, automated steps are needed to be restore and recover.

Monitor tool should detect disk failures.

Commence disaster recovery plan

Only applicable if there is a standby database or a replicated database. This can also be accomplished by breaking three-way mirrors to maintain hot backups on site.

Monitor tool needs to detect disk failure and correspond OS files to Oracle files.

Monitor tool needs to detect disk failures and correspond OS files to Oracle files.

Types of Outage

Steps

Early Detection

Error Detection

Cannot allocate extents or tablespace fragmentation

Add datafile 

Monitor Space Usage and fragmentation of tablespaces.

ORA-1547 or other error traps need to be trapped within the application

Loss of Index due to user error or corruption.

Drop index.

Create index.

Monitor alert.log for ORA-1578 errors that affect index data.

Periodic index validation with the analyze command.

ORA-1578 while using or validating an index.

Single Table Loss or Corruption of Single Table.

Object level recovery plan.

Restore and recover.

Monitor alert.log for ORA-1578 errors that affect table data.

Table can be analyze to check for block corruptions.

ORA-1578 while accessing or analyzing table data.

Reorganize Table - Scheduled Outage.

Drop / Create

Unload/

Direct Load

Use of Parallelism would be helpful.

Monitor and alarm when objects exceed more than 20 extents or some threshold.

ORA-1556 max extent error should be trapped within the application. 

Reorganize Index - Scheduled Outage

Drop/ Create

Monitor and alarm when objects exceed more than 20 extents or some threshold.

ORA-1556 max extent error should be trapped within the application.



Read-Only tablespaces
Read only tablespaces contain information that is static and the data is accessible only for viewing and not for modification. Read-only tablespaces NEED to be backed up once it becomes READ-ONLY. The recovery session will get an error if a file is READ-ONLY and a backup controlfile is being utilized for recovery.. All read-only tablespace files must be taken offline prior to incomplete recovery. This is what makes backing up READ-ONLY tablespace after changing to READ-ONLY very important. The moral of the story is that you need to restore the READ-ONLY tablespace from your previous backup and offline the read-only tablespace files prior to recovery with a backup controlfile. After opening the database, you then can online the READ-ONLY data files.
If you use the current control file, then recovery will complete successfully even if the tablespace changes from read-write to read-only through the redo logs. Read-only tablespace files are still online files and recovery will still apply changes to these files. Changing a file from read-write or read-only will cause some redo and during recovery, we will apply the redo. Thus the flags are changed.
Read-only tablespaces reduce the amount of work during checkpoints. Read only data files are not checkpointed and updated while only read-write files are. Recovery will ignore read-only files when possible; thus, streamlining recovery even further.

Read-Only Data

Preventive Steps

Detection Mechanisms

Index segments should only hold indexes.

1. Partition data according to functionality.

2. Hardware mirroring and redundant controllers

1. Monitor disks and for IO errors

Data segments should hold both tables and clusters.

1. Partition data according to functionality.

2. Hardware mirroring and redundant controllers

1. Monitor disks and for IO errors

Read-Only Data

Steps

Accidental drop of read-only data.

1. Restore from backup

2. commence Disaster Recovery Plan.

Loss of a read-only data file.

1. Restore from backup

2. commence Disaster Recovery Plan.


Archive Log Files
Archive log files provide the means to roll forward from a previous backup of the database. Since archive log files are applied in order, it is extremely important that all the archive log files be available in order to roll forward to the desired point in time. For that reason, Oracle recommends that archive files are also mirrored to protect from disk failure. For extra protection, copies of the archive files can be kept onsite and offsite in case of some large disaster. One of the most common errors found on customer sites is a poorly organized archiving procedure which may be backing up incomplete archive files or not detecting corrupted tapes. A scripted methodology and well-tested tape management tool may be required to safeguard from corrupted archive files. Prior to backing up or copying an archive file, the procedure should check from V$LOG table that the corresponding on-line redo log has been completely archived.
Another common problem is that the archive destination becomes full or inadequate to support the number of archive files being created. First, adequate disk space should be provided to allow for sufficient time for the log files to be backed up to tape and to allow at least a day’s worth of archive logs to be onsite or all archive logs starting from the previous database backup (cold or hot) to remain onsite. Second, multiple archival destinations should be created with automated switchover of destination when a certain threshold for available free space is reached. Finally, the archive procedure should dynamically monitor the archive destinations for lack of free space.
In the event of incomplete or complete database recovery, caution must be exercised to prevent confusion between the current set of archive files and the archive files from the previous version of the database. For example, after successful incomplete recovery, the archive log sequence numbers will reset to one. Therefore, new archive files will be created with sequence number starting from one and up which should not be confused with the older archive logs with the same number. Another example is restoring from an old cold backup and starting up without any roll forward media recovery. The archive logs generated from that point will begin with the number that was stored in the control file. Again, it is necessary to distinguish between the current database’s logs and the previous versions of those logs.

What if you lose an archive log or set of logs?

Archive Log Files

Preventive Steps

Detection Mechanisms

Archive log files are required for media recovery and for standby databases.

  • Scripted procedure to archive to the archive destination and to archive to tape.
  • Tape management tool with error checking mechanisms should be utilized.
  • Hardware mirroring of archive files.
  • Two sets of backups.
  • Error checking for successful completion of archiving from database to tape

  • Space usage of archive destination needs to be monitored.

Archive Log Files and Archivelog

Detection Mechanisms

Steps

Archiver stuck due to lack of free space in the archive destination.

Monitor free space in the archive destination and alarm when close to reasonable threshold (70%)

1. Free up space in the archive destination

2. Switch to different archived destination.

Loss of archive file(s) due to media failure or user error.

Monitor tool should check for disk failures. Alerts should be available when media failure affects archive logs. When this occurs, automatic procedures should switch to another archive destination.

1. Backup primary.

2. Refresh standby database if one exists.

Archiver can not write to the archive destination due to media failure.

Monitor tool should check for disk failures. Alerts should be available when media failure affects archive logs. When this occurs, automatic procedures should switch to another archive destination.

1. Switch archive destination.

Archive logs are not successfully backed up.

Checksums must be in place during the backup. Investigate problem and retry.

If backup is not successful because of a bad archive log file, one needs to backup Primary and recreate the standby site.

Archiver hangs because someone deactivates archivelog or disable archiving.

This should be checked before every open command. 

The archive log list command will

shutdown database
startup mount;
alter database archivelog
alter database open.




Recovery Scenarios


Media Failure, DB not in ARCH Log Mode, Restore ALL Files
Failure : loss of disk,data file,or corruption
Recovery : Restore all Oracle files:
- Data files
- Control files
- Redo log files
- Password file (optional)
- Parameter file (optional)

Advantage
- Easy to perform with low risk of error
- Recovery time is the time it takes to restore all files.

Disadvantage
- Data is lost and must be reapplied manually
- The entire database is restored to the point of the last whole closed backup

Recovery Process
 shutdown abort
Restore all files using operating system
 startup

Restoring to a Different Location, you need to rename ALL the files (control files, redo log files and database files) and modify your init.ora file to show the new path for files.
startup mount
select name from v$datafile;
select name from v$controlfile;
select member from v$logfile;
alter database rename file '/server1/oradata/user_01.dbf' to '/server2/oradata/user_01.dbf';
...
...
alter database open;



Recover a Particular Datafile when DB is Closed (Closed DB Recovery)
If your system suffers a media failure and the database stays down, you need to perform a close database recovery.  You will get an error message ORA-1157, ORA-1110, and possibly an operating system error. Follow these steps to execute a closed database recovery:

a.  Make sure the database is shutdown.
b.  Correct the media problem if possible.
c.  Restore the most recent backup of only those data files that were damaged by the media failure.
There is no need to restore any undamaged data files or any online redo log files. If the hardware problem has been repaired, and damaged data files can be restored to their original locations, do so. If the hardware problem still exists, restore the data files to another location.  This location will need to be recorded in the control file later in this procedure)
d.  Start SQL*Plus and connect to Oracle as SYS.
e.  Start the instance and mount it, but do not open the database.
startup mount
f.  Make sure the tablespace that contain the data files is offline.
select d.name, t.name, d.status from v$datafile d, v$tablespace t where d.ts# = t.ts#;
alter database datafile '/server1/oradata/user_01.dbf' offline;
alter tablespace MY_DATA offline;


If you restored the lost files to alternate locations, the new location of these files must be recorded in the control file. 
select name from v$datafile;
select name from v$controlfile;
select member from v$logfile;
alter database rename file '/server1/oradata/user_01.dbf' to '/server2/oradata/user_01.dbf';
...
...


g.  Query the V$DATAFILE view and make sure all the data files you want to recover are online.  If a data file is offline, issue the ALTER DATABASE command with the DATAFILE ONLINE option.
select name, status from V$DATAFILE;
ALTER DATABASE DATAFILE 'my_data.dbf' ONLINE;

h.  To start closed database recovery, use the RECOVER command. Oracle will now start the roll forward by applying the archived redo log files and the online redo log file.  If AUTORECOVERY is set to ON, the applying of the log files is automatic.  If it is not set to ON, you will be prompted for each log file.
recover database;

i. When recovery is complete, open the database
ALTER DATABASE OPEN;


Recover a Particular Tablespace when DB is Open (Open DB Recovery with Offline Tablespace)
If your system suffers a media failure but the database stays up, you can perform an open database recovery and recover only the tablespaces that are damaged.  Users can continue to access the online tablespaces and data files that were not damaged by the failure. Oracle automatically takes damaged data files offline. Follow these steps to execute an open database recovery with offline tablespaces:

a.  The database should be started and open.
b.  Take all tablespaces containing damaged data files offline. You can query the V$DATAFILE view to see which data files are offline.
select name, status from V$DATAFILE;
ALTER TABLESPACE tablespace_name OFFLINE immediate;

c.
  Correct the problem that caused the media failure. (If the problem cannot be corrected in a reasonable amount of time, your other option is to restore the damaged files to another location.)
(If the media problem can be corrected, restore the most recent backup files of only the data files damaged by the media failure.  Remember that the database is open.  Do NOT try to restore undamaged data files, log files, or control files.)
d. If you restored the lost files to alternate locations, the new location of these files must be recorded in the control file. 
select name from v$datafile;
select name from v$controlfile;
select member from v$logfile;
alter database rename file '/server1/oradata/user_01.dbf' to '/server2/oradata/user_01.dbf';

e.  Use the RECOVER TABLESPACE command to start offline tablespace recovery of all damaged data files in one or more offline tablespaces. Oracle will now start the roll forward by applying the archived redo log files and the online redo log file.  If AUTORECOVERY is set to ON, the applying of the log files is automatic.  If it is not set to ON, you will be prompted for each log file.
RECOVER TABLESPACE tablespace_name ;
f.  The damaged tablespaces of the open database are now recovered up to the point of failure. You can bring the offline tablespaces online:
 ALTER TABLESPACE tablespace_name ONLINE;


Recover a Particular Datafile when DB is Open (Open DB Recovery with Offline Datafile)
If your system suffers a media failure but the database stays up, you can perform an open database recovery and recover only the data files that are damaged.  Users can continue to access the online tablespaces and data files that were not damaged by the failure. Oracle automatically takes damaged data files offline. Follow these steps to execute an open database recovery with offline tablespaces:

a.  The database should be started and open.
b.  Take all tablespaces containing damaged data files offline. You can query the V$DATAFILE view to see which data files are offline.
select name, status from V$DATAFILE;
ALTER DATABASE DATAFILE '/path/file_name' OFFLINE immediate;

c.
  Correct the problem that caused the media failure. (If the problem cannot be corrected in a reasonable amount of time, your other option is to restore the damaged files to another location.)
(If the media problem can be corrected, restore the most recent backup files of only the data files damaged by the media failure.  Remember that the database is open.  Do NOT try to restore undamaged data files, log files, or control files.)
d. If you restored the lost files to alternate locations, the new location of these files must be recorded in the control file. 
select name from v$datafile;
select name from v$controlfile;
select member from v$logfile;
alter database rename file '/server1/oradata/user_01.dbf' to '/server2/oradata/user_01.dbf';

e.  Use the RECOVER DATAFILE command to start offline datafile recovery of all damaged data files. Oracle will now start the roll forward by applying the archived redo log files and the online redo log file.  If AUTORECOVERY is set to ON, the applying of the log files is automatic.  If it is not set to ON, you will be prompted for each log file.
RECOVER DATAFILE '/path/file_name';
f.  The damaged datafiles of the open database are now recovered up to the point of failure. You can bring the offline datafiles online:
ALTER DATABASE DATAFILE '/path/file_name' ONLINE;


Recover lost Datafile without Valid Backup
How to recover lost datafile when you don’t have valid backup of your database. Your database is running in archivelog mode and you have all necessary archive logs.
In situation when you don’t have valid backup and your datatabase is running in noarchivelog mode - your datafile is lost forever.
BUT, if you have all the Archive Log files since the Datafile was created, then you can still recover it without a Backup:

I will create new tablespace for my test and name it - DIEGO_TEST.
create tablespace ts_test datafile '/u01/oradata/DIEGO/DIEGO_TEST.dbf';

Now create a table MY_TEST in DIEGO_TEST tablespace. I will fill this table with 100,000 rows
create table MY_TEST tablespace DIEGO_TEST
as select rownum id, a.* from all_objects a where 1=0;
 
select tablespace_name, segment_name
from dba_segments where segment_name = 'MY_TEST';
TABLESPACE_NAME                SEGMENT_NAME
--------------- ---------------------------------------
DIEGO_TEST                        MY_TEST

You can see in the alert.log file that several log switches happened during the insert process, you can also force it:
alter system switchlogfile;

To simulate disaster I will delete DIEGO_TEST datafile using OS command and flush buffer cache several times.
!rm /u01/oradata/orcl/DIEGO_TEST.dbf;
alter system flush buffer_cache;
alter system flush buffer_cache;
alter system flush buffer_cache;

Now my query throws an error because datafile 8 is missing.
select count(*) from MY_TEST;
ERROR at line 1:
ORA-01116: error in opening database file 8
ORA-01110: data file 8: '/u01/oradata/DIEGO/DIEGO_TEST.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

How to recover this datafile when you don’t have valid backup to restore it?
The trick is to create new datafile based on the file DIEGO_TEST and fill it with data from archive logs using the following commands:
alter database datafile 8 offline drop;
alter database create datafile '/u01/oradata/DIEGO/DIEGO_TEST.dbf' as '/u01/oradata/DIEGO/DIEGO_TEST.dbf';
recover datafile 8;

At this point Oracle will start asking for Archive Log Files. Once all of them are applied just put that datafile online again:
alter database datafile 8 online;
select * from MY_TEST;



Recover Missing Archive Logs or Redo Log Files (Cancel-Based Recovery)
This situation is most likely occur if archive log files or redo log files needed for recovery are lost or damaged and cannot be restored.  In this situation, you would apply all logs until you reached the missing files and then cancel the recovery. Follow these steps to execute a cancel-based recovery:

a. If the database is still open, shut down the database using the SHUTDOWN command with the ABORT option.
b. Make a full backup of the database, including all datafiles, a control file, and the parameter files in case an error is made during the recovery.
c. Correct the problem that caused the media failure.  (If the problem cannot be corrected, the datafiles must be restored to an alternate location.  If this is the case, the ALTER TABLESPACE tablespace_name RENAME DATAFILE command must be used to change the location of the datafile in the control file.)
d. If the current control files do not match the physical structure of the database at the time you want to recover to, restore a backup of the control file that matches the database's physical file structure at the point in time you want to recover to.  Replace all current control files of the database with the one you want to use for recovery.  If you do not have a backup copy of the control file, you can create a new one.
e. Restore backups of all datafiles.  Make sure the backups were taken before the point in time you are going to recover to.  Any datafiles added after the point in time you are recovering to should not be restored.  They will not be used in the recovery and will have to be recreated after recovery is complete.  Any data in the datafiles created after the point of recovery will be lost.
Note: Make sure read-only tablespace are offline before you start recovery so recovery does not try to update the datafile headers.

e.
Start the instance and mount the database:
startup mount

f. 
If you restored files to an alternative location, change the location now in the control file by using the ALTER TABLESPACE RENAME DATAFILE command.
ALTER TABLESPACE tablespace_name RENAME DATAFILE '/server1/oradata/user_01.dbf' to '/server2/oradata/user_01.dbf';

g. 
Use the RECOVER DATABASE UNTIL CANCEL command to begin cancel-based recovery. If a backup of the control file is being used, make sure to specify the USING BACKUP parameter.
Oracle will now start the roll forward by applying the archived redo logfiles and the online redo logfile.   Oracle will prompt you for each logfile.  If you used a backup control file, you must enter the names of the online redo logfiles. 

h.  Continue applying redo logfiles until the most recent, undamaged logfile has been applied. Enter "CANCEL" instead of the logfile name to cancel the recovery.  Oracle will respond with a recovery successful message.

i.  Use the ALTER DATABASE OPEN command with the RESETLOGS or NORESETLOGS option. 
You should use the RESETLOGS option if you used a backup of the control file in recovery, or the recovery was incomplete. 
Use the NORESETLOGS option if the recovery was complete.  If you are using a standby database and must reset the logs, the standby database will have to be re-created.

You can check the ALERT file to see if your incomplete recovery was actually a complete recovery. 
If the recovery was a complete recovery, the message in the ALERT file is as follows:
RESETLOGS after complete recovery through change scn

If the recovery was incomplete, the following message is recorded:
RESETLOGS after incomplete recovery UNTIL CHANGE scn

Alter database open;
or
Alter database open resetlogs ;


j.  After opening the database using the RESETLOGS option, perform a normal shutdown and a full database backup.  If you do not do this, any changes made after the recovery and before the next full backup are unrecoverable.  If you did not reset the logs, the database is still recoverable.



Time Based Recovery (example recover from drop table)
To recover the database to the time specified by the date.  The date must be a character literal in the format 'YYYY-MM-DD:HH24:MI:SS'.

a. If the database is still open, shut down the database using the SHUTDOWN command with the ABORT option.
b. Make a full backup of the database, including all datafiles, a control file, and the parameter files in case an error is made during the recovery.
c. Restore backup control files (if necessary) and backup data files and bring them online.
d. Perform media recovery on the restored backup using the RECOVER DATABASE statement with the UNTIL TIME option.
Note: Files in read-only tablespaces should be offline if you are using a control file backup. Otherwise, the recovery will try to update the headers of the read-only files.  
e. Start the instance and mount the database:
startup mount
f.  If you restored files to an alternative location, change the location now in the control file by using the ALTER TABLESPACE RENAME DATAFILE command.
ALTER DATABASE RENAME FILE '/server1/oradata/user_01.dbf' to '/server2/oradata/user_01.dbf';

g.  Obtain the names of all data files requiring recovery by:
- Checking the list of data files that normally accompanies the control file being used.
- Querying the V$DATAFILE view.

h.  Make sure that all data files of the database are online. All data files of the database must be online unless an offline tablespace was taken offline normally. For example:
ALTER DATABASE DATAFILE 'users1' ONLINE;

i.  Issue the RECOVER DATABASE UNTIL TIME statement to begin time-based recovery. The time is always specified using the following format, delimited by single quotation marks: 'YYYY-MM-DD:HH24:MI:SS'.
The following statement recovers the database up to a specified time using a control file backup:
RECOVER DATABASE UNTIL TIME '2007-12-31:12:47:30' USING BACKUP CONTROLFILE;
or
RECOVER DATABASE UNTIL TIME '2007-12-31:12:47:30';

j.  Apply the necessary redo log files to reconstruct the restored data files. Unless the application of files is automated, Oracle supplies the name it expects to find from LOG_ARCHIVE_DEST and requests you to stop or proceed with applying the log file. If the control file is a backup, you must supply names of online logs.

k.  Apply redo log files until the last required redo log file has been applied to the restored data files. Oracle automatically terminates the recovery when it reaches the correct time, and returns a message indicating whether recovery is successful.
l. Open the DB
Alter database open;
or
Alter database open resetlogs ;
 

Changed Based Recovery
This recovers the database to a transaction consistent state immediately prior to the system change number (SCN) specified by integer:

a. If the database is still open, shut down the database using the SHUTDOWN command with the ABORT option.
shutdown abort;
b. Back up the database as a precaution and correct any media failures.
c. Restore backup data files.
d. Start the instance and mount the database:
startup mount
e.  If one or more damaged data files were restored  to an alternative location, change the location now in the control file by using:
ALTER TABLESPACE tablespace_name RENAME DATAFILE '/server1/oradata/user_01.dbf' to '/server2/oradata/user_01.dbf';

f.
To determine the SCN needed for recovery, you can query the V$LOG_HISTORY view.
Select * from v$log_history where rownum < 3;
THREAD#   SEQUENCE#   TIME                   LOW_CHANGE#  HIGH_CHANGE#   ARCHIVE_NAME
     1        47213   06/26/98 12:47:55      116098950    116098954      /u01/arch/log_47213.arc
     1        47212   06/26/98 12:47:33      116098947    116098949      /u01/arch/log_47212.arc
The LOW_CHANGE# column represents the SCN at the beginning of the log file and the HIGH_CHANGE# represents the SCN at the end of the log file.

g.  Begin change-based recovery, specifying the SCN for recovery termination. The SCN is specified as a decimal number without quotation marks. For example, to recover until SCN 116098954 issue:
RECOVER DATABASE UNTIL CHANGE 116098954;

h.  Oracle begins the roll forward phase of media recovery by applying the necessary redo log files (archived and online) to reconstruct the restored data files. Unless the application of files is automated, Oracle supplies the name it expects to find from LOG_ARCHIVE_DEST and requests you to stop or proceed with applying the log file. If the control file is a backup file, you must supply names of online logs.

i.  Oracle apply the redo log files until the last required redo log file has been applied to the restored data files. Oracle automatically terminates the recovery when it reaches the correct system change number, and returns a message indicating whether recovery is successful.


Recover DataFile in Backup Mode
On this scenario, the DB crashed during the backup process. So you need to perform the following steps:
a. Startup the database if it is already starting shut it down then startup. It will ask for media recovery
    ORA-01113 : file 3 needs media recovery
    ORA-01110 : data file 3 : ‘/clover/test.dbf’
So finally you can only mount it.
b. Query v$backup to determine those datafiles that are in the ACTIVE state:
select FILE#, STATUS from V$Backup;
c. Place the datafiles out of hot backup mode by using
Alter database datafile 3 end backup ;
or
Alter database datafile ‘/clover/test.dbf’ end backup ;
Since the datafile header was frozen the database files were not synchronized.
d. Open the DB
Alter database open;


Loss of Control File
Control file problems are usually not detected while the database is up and running. If the control file is lost or damaged in such a way that Oracle can't recognize it, a subsequent database startup will result in ORA-205 (error in identifying control file "%s"), along with an operating system level error.
When only one control file is lost, the following steps can be used to recover:
a. Perform an create pfile from spfile;
b. Shut down the instance if it's still running.
c. Find the cause of the loss of the control file. Is it due to a hardware problem (disk or controller)?
d. If the hardware isn't a problem, copy a good copy of an existing control file to the location of the lost control file and skip to step g
e. If the hardware is the problem, copy a good copy of an existing control file to a reliable location.
f. Edit init.ora updating the CONTROL_FILES parameter to reflect the new location of the control file. Also perform create spfile from pfile;
g. Start up the database.


Loss of ALL Control Files
The following steps can be used to recover:
1.Shut down the database.
2.Take a full database backup, including all the datafiles and redo log files.
3.Start up the database in the NOMOUNT state.
4.Re-create the control file using CREATE CONTROLFILE. Alternatively, you can back up the control file to a trace file and execute the file.
5.Perform media recovery on the database.
6.Open the database.
7.Do a shutdown normal of the database.
8.Take a cold backup of the database.


Loss of "INACTIVE" Online Redo Log Group Loss of "Current" Online Redo Log Group
Loss of UNDO Tablespace
If you get an ORA-00600 or the following error message on your Oracle database:
...
ORA-00600: internal error code, arguments: [3020], [356515961], [1], [4606], [2], [16], [], []
ORA-10567: Redo is inconsistent with data block (file# 85, block# 121)
ORA-10564: tablespace UNDO
...
Please note, that the following rescue scenario resolves from corrupt rollback systems, the UNDO datafile still exists. If your UNDO datafile is corrupt or lost go to the section Rescue Tablespace.

1.  Shutdown the Database if it is still running
    sqlplus "/ as sysdba"
    create pfile from spfile;
    shutdown abort;


2.  Verify the Rollback Segments in the UNDO Tablespace
    The name of the Rollback Segments in the UNDO Tablespace are _SYSSMU1$,  _SYSSMU2$ , _SYSSMU3$, etc. One Rollback Segment in the SYSTEM Tablespace is called SYSTEM, we use exactly this Rollback Segment to rescue the UNDO Tablespace.
    cd <DIRECTORY-OF-SYSTEM-TABLESPCAE>
    strings <SYSTEM-TABLESPACE-FILE>.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u
    Probably you get the following Rollback Segments:
    _SYSSMU1$,
    _SYSSMU2$,
    _SYSSMU3$,
    _SYSSMU4$,
    _SYSSMU5$,
    _SYSSMU6$,
    _SYSSMU7$,
    _SYSSMU8$,
    _SYSSMU9$,
    _SYSSMU10$
    _SYSSMU11$
    _SYSSMU12$
    _SYSSMU13$
    _SYSSMU14$
    _SYSSMU15$
    _SYSSMU16$
    _SYSSMU17$
    _SYSSMU18$
    _SYSSMU19$
    _SYSSMU20$
3.  Uncomment and add the following lines to PFILE
    # undo_management = auto
    # undo_retention = 10800
    # undo_tablespace = undotbs1
    undo_management = manual
    rollback_segments = (system)
    _corrupted_rollback_segments = ('_SYSSMU1$','_SYSSMU2$','_SYSSMU3$','_SYSSMU4$','_SYSSMU5$','_SYSSMU6$','_SYSSMU7$','_SYSSMU8$','_SYSSMU9$','_SYSSMU10$','_SYSSMU11$','_SYSSMU12$','_SYSSMU13$','_SYSSMU14$','_SYSSMU15$','_SYSSMU16$','_SYSSMU17$','_SYSSMU18$','_SYSSMU19$','_SYSSMU20$')

4.  Startup in RESTRICT Mode with edited PFILE, then drop the Rollback Segments and UNDO Tablespace
    sqlplus "/ as sysdba"
    startup restrict pfile='/home/oracle/oracle/initTEST.ora';
    drop rollback segment "_SYSSMU1$";
    drop rollback segment "_SYSSMU2$";
    drop rollback segment "_SYSSMU3$";
    drop rollback segment "_SYSSMU4$";
    drop rollback segment "_SYSSMU5$";
    drop rollback segment "_SYSSMU6$";
    drop rollback segment "_SYSSMU7$";
    drop rollback segment "_SYSSMU8$";
    drop rollback segment "_SYSSMU9$";
    drop rollback segment "_SYSSMU10$";
    drop rollback segment "_SYSSMU11$";
    drop rollback segment "_SYSSMU12$";
    drop rollback segment "_SYSSMU13$";
    drop rollback segment "_SYSSMU14$";
    drop rollback segment "_SYSSMU15$";
    drop rollback segment "_SYSSMU16$";
    drop rollback segment "_SYSSMU17$";
    drop rollback segment "_SYSSMU18$";
    drop rollback segment "_SYSSMU19$";
    drop rollback segment "_SYSSMU20$";

5.  Check for any other corrupted rollback Segmnent
    select segment_name,status,tablespace_name from dba_rollback_segs;
    SEGMENT_NAME                   STATUS           TABLESPACE_NAME
    ------------------------------ ---------------- ------------------------------
    SYSTEM                         ONLINE           SYSTEM

If not more segments except SYSTEM, them proceed to the next step; otherwise go back to step # 2

6. Remove the OLD UNDO Tablespace and create a new one:
    drop tablespace UNDOTBS1 including contents and datafiles;
    CREATE UNDO TABLESPACE undotbs2 DATAFILE '/home/oracle/undotbs2.dbf' SIZE 500M;

    At this point, no UNDOTBS1 Tablespace exists anymore, and the SYSTEM Rollback Segment is the only active Rollback Segment.

7.  Now go back to your init.ora PFILE and uncomment the ones that we commented before and commented out the added lines in step #3

    undo_management = auto
    undo_retention = 10800
    undo_tablespace = undotbs2
    # undo_management = manual
    # rollback_segments = (system)
    #
    # _corrupted_rollback_segments =
('_SYSSMU1$','_SYSSMU2$','_SYSSMU3$','_SYSSMU4$','_SYSSMU5$','_SYSSMU6$','_SYSSMU7$','_SYSSMU8$','_SYSSMU9$','_SYSSMU10$','_SYSSMU11$','_SYSSMU12$','_SYSSMU13$','_SYSSMU14$','_SYSSMU15$','_SYSSMU16$','_SYSSMU17$','_SYSSMU18$','_SYSSMU19$','_SYSSMU20$')

8.  Finally, we are ready to re-start the DB and assign the corrent new Undo tablespace called UNDOTBS2
    shutdown immediate;
    create spfile from pfile;
    startup;

    alter system set undo_tablespace=undotbs2 scope=both;

    select segment_name,status,tablespace_name from dba_rollback_segs;
    SEGMENT_NAME                   STATUS           TABLESPACE_NAME
    ------------------------------ ---------------- ------------------------------
    SYSTEM                         ONLINE           SYSTEM
    _SYSSMU21$                     ONLINE           UNDO
    _SYSSMU22$                     ONLINE           UNDO
    _SYSSMU23$                     ONLINE           UNDO