eTrust Directory - Tips on how to implement a Disaster Recovery Plan, using LDIF Copy (dxdumpdb, dxloaddb) -or- using Database Copy (copydb), to add or recover a Directory Replica

Document ID : KB000055409
Last Modified Date : 14/02/2018
Show Technical Document Details

Solution:

Disaster Recovery Overview

A Disaster Recovery Plan puts in place a set of documented procedures that can be tested and used if and when a disaster occurs. These are operational procedures that can be followed religiously (if tested) at such times. What defines a disaster? Essentially any disruption to normal service operation must follow documented operational procedures. The levels of operational service can be thought as ranging from:

  • Normal Service Operation.
  • Scheduled Service Outage, e.g. for database tuning or software upgrade.
  • Small (minutes) Unscheduled Service Outage, e.g. network outage.
  • Large (hours) Unscheduled Service Outage, e.g. application down.
  • System Down, e.g. hardware failure.
  • Multiple Systems being Down, e.g. due to multiple hardware failure.
  • Site Disaster, e.g. Earthquake.
  • Entire Service Outage, e.g. network outage or multiple system outages due to software virus.

A key feature when configuring large systems for fast data recovery is the ability to 'clone' one Ingres system from another, providing the two systems have identical logical storage configurations. This enables file coping of the underlying database table files from one machine to another via a multitude of methods.

Best Practice is to maintain three replicas for each DSA - this allows 100% uptime even if one DSA needs its data to be re-synchronized with the other two replicas.

The following tables show the operational principles behind recovering a failed server C, from two remaining servers A and B. The method is to keep A running taking the full load and stall B just so long as to take a database snapshot of B. At the instant that the B snapshot is taken, the A server queues updates to C. B's snapshot is then used to build/rebuild C and then the queues on A for C are flushed to bring C fully in synch with A and B. Note that during the snapshot, B can stay online if the snapshot is taken via an Ingres checkpoint.

Disaster Recovery Steps using LDIF Copy (dxdumpdb, dxloaddb) to add or recover a Directory Replica

StepTimeCurrent StateAA's QueuesBCNext Task
1. T-0 A & B running Up   Up NA Configure C (dxnewdb)
Configure A&B to mesh with C
2. T-1 A & B running
C configured
Up   Up Down Init A
Stop B
3. T-2 A queues to B & C Up B(T-1), C(T-1) Down Down Dump B to LDIF (using dxdumpdb)
4. T-3 B dumped Up B(T-1), C(T-1) Down Down Copy LDIF to C
Start B
5. T-4 A is updating B
C has LDIF of B at T-1
Up B'(T-1), C(T-1) Up Down Start loading C (using dxloaddb) **
6. T-5 B synchronized with A
C is loading
Up C(T-1) Up Down Checkpoint C's DB and re-activate journaling (i.e. dxbackupdb +journal C). Fully tune C's database (i.e. dxtunedb -full {dbName}).
7. T-6 C is loaded Up C(T-1) Up Down Start C
8. T-7 A is updating C Up C'(T-1) Up Up  
9. T-8 C synchronized with A Up   Up Up  

** If "notSearchable" attributes are defined in the database, you can specify these as a parameter to the DXLOADDB command. Remember to apply any other special indexing after the database has loaded

Disaster Recovery Steps using Database Copy (copydb) to add or recover a Directory Replica

Step Time Current StateA A's Queues B C Next Task
1. T-0 A & B running Up   Up NA Configure C (use dxnewdb if it is a new replica and destroydb and createdb if it is an existing replica)
Configure A&B to mesh with C
2. T-1 A & B running
C configured
Up   Up Down Init A
Stop B Take DB copy of B (using copydb and copy.out SQL script)
3. T-2 A queues to B & C Up B(T-1), C(T-1) Down Down Copy DB files from B to C Start B
4. T-3 B copied to C A is updating B Up B'(T-1) C(T-1) Up Down Load C using copied DB files & the SQL copy.in script. Checkpoint C's DB and re-activate journaling (i.e. dxbackupdb +journal C). Start C
Fully tune C's database
(i.e. dxtunedb -full {dbName}).
5. T-4 B & C are running
A is updating C
Up C'(T-1) Up Up  
6. T-5 B & C synchronized with A Up   Up Up