IT Disaster Recovery Planning For Dummies Cheat Sheet - dummies
Cheat Sheet

IT Disaster Recovery Planning For Dummies Cheat Sheet

From IT Disaster Recovery Planning For Dummies

By Peter H. Gregory, Philip Jan Rothstein

The purpose of an IT disaster recovery plan is to recover the IT systems and infrastructure that support business processes critical to the organization’s survival. Whether you’re hit by a natural disaster or a hack attack, take a clue from the Boy Scouts: Be prepared. Know what goes into an interim plan; determine the business impact of a disaster; discover what you need in a recovery plan; and be sure to test your plan.

10 Elements of an Interim IT Disaster Recovery Plan

If you don’t have a fully detailed IT disaster recovery plan (DRP) right now, then implement an interim disaster recovery plan (IDRP) as you develop your long-term safety net. Sequester two or three experts for one day to develop an interim disaster-recovery plan that contains:

  • A list of people on the emergency response team

  • Procedures for declaring a disaster

  • Procedures for invoking the DR plan

  • Emergency communications

  • How to carry out basic recovery plans

  • Viable processing center alternatives

  • How to enact preventive measures

  • A documented interim DR plan

  • Wallet-sized emergency contact lists

  • Training methods for emergency response team members

8 Parts of a Business-Impact Analysis in an IT Disaster Recovery Plan

Use a business impact analysis to help determine which processes and systems warrant the expense and effort related to developing your IT disaster recovery plan. A business impact analysis (BIA) is a detailed inventory of the primary processes, systems, assets, people, and suppliers that are associated with an organization’s principle business activities.

The core purpose of a Business Impact Analysis is to identify which processes and systems are the most critical to the survival of an organization.

Follow these steps to complete the business impact analysis:

  1. Establish the project team, scope, and budget; name a project manager.

  2. Get executive support.

  3. Inventory your key business elements:

    • Business processes

    • Information systems/applications

    • Assets

    • Personnel

    • Suppliers

    Develop intake forms that you can use to gather consistent information. Interview key experts throughout the business. Get information from inventories.

  4. Tabulate the results in a spreadsheet or document.

  5. For each business process, determine the Maximum Tolerable Downtime (MTD)

    MTD is the longest time the process can remain disabled before it threatens the organization’s survival.

  6. For each business process, determine a reasonable Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

  7. Sort the list of business processes into MTD or RTO order.

    The processes with the shortest MTD or RTO are the most critical business processes. Get agreement from senior management.

  8. Perform a risk analysis on each critical process to identify any vulnerabilities that exist, along with steps to mitigate those vulnerabilities.

6 Parts of an Effective IT Disaster Recovery Plan

As you prepare to develop and document the IT disaster recovery plans for the components that support critical business processes, you should know what exactly goes into a DR plan, how to structure it, and how to manage the contents of the plan.

Disaster recovery plans should contain

  • Disaster declaration procedures.

  • Emergency contact lists.

  • Emergency leadership and role selection.

  • Damage assessment procedures.

  • System recovery and restart procedures.

  • Procedure on how to transition to normal operations.

Keep plan documents under version control, use consistent formatting, have subject matter experts review the plan, test the plan’s efficacy, and distribute the documents to all likely disaster recovery team personnel.

After you write the DR plan, publish it in forms that make it available to recovery personnel: Distribute them in multiple forms (including hard copy, CD-ROM, USB drive, and so on) so emergency response personnel can actually access those plans from wherever they are, without having to depend on the same IT systems that they may be expected to recover.

5 Ways to Test IT Disaster Recovery Plans

Testing is a natural part of the lifecycle for many technology development efforts: software, processes, and — yes — disaster recovery planning. Disasters don’t occur very often so you seldom can clearly tell if those DR plans will actually work. And given the nature of disasters, if your DR plan fails, the organization may not survive the disaster.

Here are the five types of disaster recovery tests:

  • Paper test: Individuals read and annotate recovery plans.

  • Walkthrough test: Groups walk through plans to identify issues and changes.

  • Simulation: Groups go through a simulated disaster to identify whether emergency response plans are adequate.

  • Parallel test: Recovery systems are built/set up and tested to see if they can perform actual business transactions to support key processes. Primary systems still carry the full production workload.

  • Cutover test: Recovery systems are built/set up to assume the full production workload. You disconnect primary systems.

Structure your DR testing in the same way you structure other complicated undertakings, such as software development and associated testing. Just follow these steps:

  1. Determine how frequently you should perform each type of test.

  2. Test individual components.

    Note any discrepancies, and then pass the plan back to the people who wrote each section so they can update it. This process improves the quality and accuracy of the DR plan, which increases the likelihood that the organization will actually survive a disaster if one occurs.

  3. Perform wider tests of combined components.

  4. Test the entire plan.

By performing these four steps, you can identify many errors during individual tests and correct those errors before you do more comprehensive tests. This process saves time by preventing little errors from interrupting comprehensive tests that involve a lot of people.