4 Oracle Database High Availability Solutions for Unplanned Downtime
Oracle Database offers an integrated suite of high availability solutions that increase availability and eliminate or minimize both planned and unplanned downtime. These solutions help enterprises maintain business continuity 24 hours a day, 7 days a week. However, the Oracle high availability solutions go beyond reducing downtime by providing solutions to increase system use on the primary and secondary systems and to help improve overall performance, scalability, and manageability.
Table 4-1 describes the various Oracle high availability solutions for unplanned downtime. The table shows how the features discussed in the subsequent sections can be used to address various causes of unplanned downtime. Where several Oracle solutions are listed, the MAA recommended solution is indicated in the Oracle Solution column.
Table 4-2 describes the high availability solutions in each of the MAA service-level tiers for the MAA reference architectures and multitenant architectures.
Table 4-1 Outage Types and Oracle High Availability Solutions for Unplanned Downtime
Outage Scope | Oracle Solution | Benefits |
---|---|---|
Site failures |
Oracle Data Guard (MAA recommended) and Oracle Application Failover solution |
Integrated client and application failover Fastest and simplest database replication Supports all data types Zero data loss by eliminating propagation delay Oracle Active Data Guard supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary Database In-Memory support |
Flexible logical replication solution (target is open read/write) Active-active high availability (with conflict resolution) Heterogeneous platform and heterogeneous database support |
||
Recovery Manager, Zero Data Loss Recovery Appliance and Oracle Secure Backup |
Fully managed database recovery and integration with Oracle Secure Backup Recovery Appliance provides end-to-end data protection for backups and reduces data loss for database restores Non-real-time recovery |
|
Instance or computer failures |
Oracle Real Application Clusters and Oracle Clusterware (MAA recommended) |
Integrated client and application failover Automatic recovery of failed nodes and instances Lowest application brownout with Oracle Real Application Clusters |
Integrated client and application failover Online database relocation migrates connections and instances to another node Better database availability than traditional cold failover solutions |
||
Integrated client and application failover Fastest and simplest database replication Supports all data types Zero data loss by eliminating propagation delay Oracle Active Data Guard supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary Database In-Memory support |
||
Flexible logical replication solution (target is open read/write) Active-Active high availability (with conflict resolution) Heterogeneous platform and heterogeneous database support |
||
Storage failures |
Oracle Automatic Storage Management (MAA recommended) |
Mirroring and online automatic rebalancing places redundant copies of the data in separate failure groups. |
Oracle Data Guard (MAA recommended) |
Integrated client and application failover Fastest and simplest database replication Supports all data types Zero data loss by eliminating propagation delay Oracle Active Data Guard supports read-only services and DML on global temporary tables and sequences to off-load more work from the primary Database In-Memory support |
|
Recovery Manager with Fast Recovery Area, and Zero Data Loss Recovery Appliance (MAA recommended) |
Fully managed database recovery and managed disk and tape backups |
|
Flexible logical replication solution (target is open read/write) Active-active high availability (with conflict resolution) Heterogeneous platform and heterogeneous database support |
||
Data corruption |
Corruption Prevention, Detection, and Repair (MAA recommended) Database initialization settings such as |
Different levels of data and redo block corruption prevention and detection at the database level |
Data corruption |
Oracle Data Guard (MAA recommended) Oracle Active Data Guard Automatic Block Repair
|
In a Data Guard configuration with an Oracle Active Data Guard standby, physical block corruptions detected by Oracle at a primary database are automatically repaired using a good copy of the block retrieved from the standby, and vice versa. The repair is transparent to the user and application. Strong database isolation of data corruptions with Oracle Active Data Guard. With MAA recommended initialization settings, Oracle Active Data Guard and Oracle Exadata Database Machine, achieve most comprehensive full stack corruption protection. With If a lost write is detected on the standby database, you can restore the affected file and restart Redo Apply if the lost write is isolated and the hardware problem is corrected.
Note: Lost writes can corrupt the entire database, which may require that you rebuild the affected database after resolving the hardware issue. Database In-Memory support |
Dbverify, Analyze, Data Recovery Advisor and Recovery Manager, Zero Data Loss Recovery Appliance, and ASM Scrub with Fast Recovery Area (MAA recommended) |
These tools allow the administrator to execute manual checks to help detect and potentially repair from various data corruptions. Dbverify and Analyze conducts physical block and logical intra-block checks. Analyze can conduct inter-object consistency checks. Data Recovery Advisor automatically detects data corruptions and recommends the best recovery plan. RMAN operations can conduct both physical and inter-block logical checks. RMAN can execute online block-media recovery using flashback logs, backups, or the standby database to help recover from physical block corruptions. Recovery Appliance does periodic backup validation that helps ensure that your backups are valid. Recovery Appliance allows you to input your recovery window requirements and alerts you when those SLAs cannot be met with your existing backups managed by Recovery Appliance. ASM Scrub detects and attempts to repair physical and logical data corruptions with the ASM pair in normal and high redundancy disks groups. |
|
Data corruption |
Oracle Exadata Database Machine and Oracle Automatic Storage Management (MAA recommended) DIX + T10 DIF Extensions (MAA recommended where applicable) |
If Oracle ASM detects a corruption and has a good mirror, Oracle ASM returns the good block and repairs the corruption during a subsequent write I/O. Exadata provides implicit HARD enabled checks to prevent data corruptions caused by bad or misdirected storage I/O. Exadata provides automatic HARD disk scrub and repair. Detects and fixes bad sectors. DIX +T10 DIF Extensions provides end to end data integrity for reads and writes through a checksum validation from a vendor's host adapter to the storage device |
Flexible logical replication solution (target is open read/write). Logical replica can be used as a failover target if partner replica is corrupted. Active-active high availability (with conflict resolution) Heterogeneous platform and heterogeneous database support |
||
Human errors |
Oracle Security Features (MAA recommended) |
Restrict access to prevent human errors |
Oracle Flashback Technology (MAA recommended) |
Fine-grained error investigation of incorrect results Fine-grained and database-wide rewind and recovery capabilities |
|
Delays or slow downs |
Oracle Database and Oracle Enterprise Manager Oracle Data Guard (MAA recommended) and Oracle Application Failover solution |
Oracle Database automatically monitors for instance and database delays or cluster slow downs and attempts to remove blocking processes or instances to prevent prolonged delays or unnecessary node evictions. Oracle Enterprise Manager or a customized application heartbeat can be configured to detect application or response time slowdown and react to these SLA breaches. For example, you can configure the Enterprise Manager Beacon to monitor and detect application response times. Then, after a certain threshold expires, Enterprise Manager can call the Data Guard Database In-Memory support |
File system data |
Enables full stack failover that includes non-database files |
If you are managing many databases in DBaaS, we recommend using the MAA tiers and Oracle Multitenant as described in Oracle MAA Reference Architectures. Table 4-2 identifies various unplanned outages that can impact a database in multitenant architecture. It also identifies the Oracle HA solution to address that outage that is available in each of the HA tiers.
Table 4-2 Unplanned Outage Matrix for MAA Reference Architectures and Multitenant Architectures
Event | Solutions by MAA Tier | Recovery Window (RTO) | Data Loss (RPO) |
---|---|---|---|
Instance Failure |
BRONZE: Oracle Restart |
Minutes if instance can restart |
Zero |
SILVER: Oracle RAC or optionally Oracle RAC One Node |
Seconds with Oracle RAC Minutes with Oracle RAC One Node |
Zero |
|
GOLD: Oracle RAC |
Seconds |
Zero |
|
PLATINUM: Oracle RAC with Application Continuity |
Zero Application Outage |
Zero |
|
Permanent Node Failure (but storage available) |
BRONZE: Restore and recover |
Hours to Day |
Zero |
SILVER: Oracle RAC |
Seconds |
Zero |
|
SILVER: Oracle RAC One Node |
Minutes |
Zero |
|
GOLD: Oracle RAC |
Seconds |
Zero |
|
PLATINUM: Oracle RAC with Application Continuity |
Zero Application Outage |
Zero |
|
Storage Failure |
ALL: Automatic Storage Management |
Zero downtime |
Zero |
Data corruptions |
BRONZE/SILVER: Basic protection Some corruptions require recover restore and recovery of pluggable database (PDB), entire multitenant container database (CDB) or non-container database (non-CDB) |
Hour to Days |
Since last backup if unrecoverable Zero or Near Zero with Recovery Appliance |
GOLD/PLATINUM: Comprehensive corruption protection and Auto Block Repair with Oracle Active Data Guard |
Zero with auto block repair Seconds to minutes if corruption due to lost writes and using Data Guard Fast Start failover. |
Zero unless corruption due to lost writes |
|
Human error |
ALL: Logical failures resolved by flashback drop, flashback table, flashback transaction, flashback query flashback pluggable database, and undo. |
Dependent on detection time but isolated to PDB and applications using those objects. |
Dependent on logical failure |
All: Comprehensive logical failures impacting an entire database and PDB that requires RMAN point in time recovery (PDB) or flashback pluggable database |
Dependent on detection time |
Dependent on logical failure |
|
GOLD/PLATINUM: With Oracle GoldenGate, you can fail over just one PDB |
Dependent on detection time but actual failover can take seconds |
Dependent on logical failure |
|
Database unusable, system, site or storage failures, wide spread corruptions or disasters |
BRONZE/SILVER: Restore and recover |
Hours to Days |
Since last backup Zero or near zero with Recovery Appliance |
GOLD: Fail over to secondary (Oracle Active Data Guard or Oracle GoldenGate) |
Seconds |
Zero to Near Zero |
|
PLATINUM: Active Data Guard Failover with Application Continuity |
Zero Application Outage |
Zero |
|
Performance Degradation |
ALL: Oracle Enterprise Manager for monitoring and detection, Database Resource Management for Resource Limits and ongoing Performance Tuning |
No downtime but degraded service |
Zero |
See Also:
High Availability Architectures for tables summarizing the attainable recovery times for all of the types of unplanned downtime for each Oracle high availability reference architecture