Sunday, August 12, 2007


One step beyond Disaster Recovery

I recently advised a medium-sized commercial bank in the Philippines about a stalled project to create a business continuity solution.

Financial institutions in the Philippines do not face equivalent data integrity and safety requirements as they do here in the U.S. Still, management knew that they had to improve their IT capabilities. Their primary data center is located in their head office and it’s vulnerability surfaced at every coup attempt.

They learned about me from another client. Click
here for that story.

The bank was trying to install an EMC Asynchronous SRDF solution.

I briefly worked for EMC U.S.A. as a systems engineer. I’m familiar with the product line and the subject of disaster recovery & business continuity in general.

Disaster Recovery (DR) aptly describes the process of recovering from a disaster.

DR can be illustrated with the knowledge that all hard drives crash. It’s not a question of “if,” but a question of “when.” When the drives of a “production box” crash, business grinds to a halt unless and until the data can be restored and the server restarted. The process of restoring the data and restarting the server is disaster recovery.
A “production box” is tech-speak for a computer server that’s serving a live network.
Operations can grind to a halt for any number of reasons. Fire, a software crash, human error, network failure, and a power blackout are common culprits.

DR planning begins by defining the acceptable minimum values of two factors. The first is called the Recovery Time Objective (RTO) and the second is the Recovery Point Objective (RPO).
RTO is the amount of time you require to recover your lost or damaged data in order to become operational again. Can your business tolerate being down for several days or several hours? Whether it’s days or hours, this figure is your RTO.
RPO, on the other hand, is the amount of data accumulated over time that you can tolerate losing. Can your business afford to lose a day’s worth of data? If so, then your data must be backed up on a daily basis. A retail operation, like a supermarket, that logs hundreds or thousands of transactions a day may require several backups made during the course of the day.
“Business Continuity” (BC) extends the scope of preparation, plans, and resources past DR. Those two factors, RTO and RPO, figure into this as well.
BC’s goal is to ensure the business will be able to continue operating through crises and disasters. Accomplishing that requires going beyond the processes and equipment for restoring data and replacing equipment. Indeed, BC refers to making plans and preparing resources that, among other things, will prevent the loss of data. It refers to advance preparation in order to cope with the unexpected.

A good BC plan has:
  1. identified the most likely disaster scenarios and their impact on the business;
  2. determined the “mission-critical,” important, and less-important processes, systems, and services of the company;
  3. established its priorities for supporting the mission-critical components;
  4. developed and implemented the most redundant and fault-tolerant system possible within its budget;
  5. several alternate strategies
  6. taught and regularly practice the plan with its people; and
  7. the continuing support of senior management.
“Mission-critical” is tech-speak for the most important processes, systems, and services that a business must have in order to fulfill its mission. What is a mission? For a hospital, it could be the 24/7 availability of patient information.

“Redundant” is tech-speak for a backup that can temporarily take the place of a failed primary system.

“Fault-tolerant” is tech-speak for the characteristic of being able to withstand glitches.

Certain industries and companies require uninterrupted IT services. For them, BC is mandatory. The airline industry and financial institutions are examples. The financial sector, in fact, has to follow stringent guidelines for protecting and maintaining the security of its data. These companies must have minimal downtime. How minimal?
A calendar year has 8,760 hours. To give you an idea of the pressure to perform, consider that a 99.9% uptime is “only” equivalent to 8,751 hours.
Imagine the trouble a bank would face if it's nine non-operational hours occurred on the 15th. Employees would not receive their pay.
It turns out that a 99.99% uptime is required to stay operational 8,759 hours of the year! That’s still one hour short of the goal!
When the availability or integrity of data is compromised for any reason, businesses risk losing revenue and market share, experiencing decreased productivity, damaging their reputation, eroding their customers’ loyalty, and, in certain industries, being penalized for failing to comply with mandated regulations.

I enjoy BC planning because it's an activity that can incorporate numerous improvements for a little or no additional cost. It's a rare opportunity to deliver a lot of added value beyond the client's initial expectations.

There are several ways to go with DR and BC. You can create it in-house or outsource some or all of its aspects.

I'll cover both but the next entry will focus on the offerings of two established players in the field of storage, DR, and BC. These are the two
I’m familiar with, EMC and NetApp.

Sphere: Related Content

No comments: