Business Continuity & Disaster Recovery

Knowing how you are going to maintain ‘business as usual’ in the event of data or technology problems is vital. Disruption of your operations, reduced or absence of customer service, reduced profitability and compromised reputation are all potential consequences of not having the appropriate strategy in place. Having an effective business continuity plan can also be prerequisite to securing suitable business insurance or lower premiums, or eligibility for tendering for new or continued business.

The first stage is to identify your critical business functions (using a process known as ‘Business Impact Analysis’) and the risks associated with their operation – and developing appropriate risk reduction measures to ensure continuity of operations in the event of an incident which disrupts those operations. Key considerations when developing a business continuity plan should include:

  • Denial of access
  • Loss of key staff
  • Loss of key supplier
  • Loss of key systems

The Business Impact Analysis will identify the ‘Recovery Time Objective’ (how long and to what level a business process must be restored after a disruption) in order to avoid unacceptable consequences.

IT systems must be designed to be resilient and need to be available to match the needs of the business and the Recovery Time Objective.  So if a function is required to be available 24 x 7, the system supporting that function must reflect that need. For example, a company website where customers can place orders needs to be available 24 x 7 and needs the appropriate resilience. Other systems, however, could tolerate non-availability of some hours before this started to affect the bottom line or legal or financial responsibilities. Some systems will be critical at certain times and not others (for example a payroll system).

IT systems (servers and telecoms infrastructure) for larger organisations tend to be housed in datacentres. Datacentres are replicated so that in the event of a catastrophic incident in one datacentre, systems will automatically switch (‘failover’) to the second site (disaster recovery). Datacentres need to be located sufficiently far apart so that both are not both affected by a single incident. Failover testing should be performed on a regular basis, for critical systems and if practical the whole datacentre.

Systems need to be backed up to ensure against data loss or corruption. The frequency of backups is determined by the ‘Recovery Point Objective’ – defined as the maximum tolerable period in which data might be lost from an IT service due to a major incident. The Recovery Point Objective gives systems architects a time limit to work to. For instance, if it is set to four hours, then in practice offsite backups must be continuously maintained – a daily offsite backup on tape is not sufficient.

For smaller organisations not using datacentre services, backups should be stored offsite and in a fireproof safe. Retention periods for backups are determined by the needs of the business, or in some cases by legislation.

Checks need to be performed that backups have run correctly and sample backups should be restored on a regular basis to ensure the data is readable.