Sunday, July 29, 2007


Business continuity in action!

Four years ago this month, a disaster recovery solution we created proved its worth. It saved our client, an international property consulting firm, from tanking after an attempted coup d’etat in 2003.

The Philippines has been wracked by four or five coup attempts in the last ten years. That averages to one every 24 months!

The renegades, 300 heavily armed soldiers and their leaders, barricaded themselves in several buildings in Makati’s business district. The standoff lasted for 19 hours before they surrendered. During the crisis, authorities turned off the power grid that served the contested area. My client’s office was within that grid. After it was over, her staff returned to a ransacked office.

Other companies in her building were not so fortunate. In fact, my client was the only one who was able to restore her data and resume operations as if nothing had happened.

About half of her neighbors—branch offices of large companies as well as individual businesses—did not back up at all. As for the other half, the IT manager kept their backup data offsite by bringing the media home. I learned that many of the ones who backed up discovered that their backups were too old or could not be restored.
The latter didn’t surprise me. In smaller shops, many IT administrators diligently back up their data but neglect to regularly test the media’s integrity by doing test restores. Back in the days when I was a network engineer, I was installing Citrix in a 25-desktop network. They had two Windows NT servers and had always been using Windows NT’s built-in backup utility. I knew about that utility’s notorious reputation so I challenged them to restore the data (prior to my continuing my work). Their office manager, who doubled as the IT administrator, pulled out seven tape cartridges. One by one, she tried to restore the contents of each tape. And one by one, she discovered they were empty. In fact, if I recall correctly, the MS-DOS directory listing revealed one empty folder in each tape. That’s how we sold a lot of ARCserve software back then!
As for my client, months earlier, we added additional storage. I persuaded them to configure it to do double-duty as a disaster recovery system. The hardware was kept in a cabinet closet (literally) down the hall. The system consisted of a NetApp NAS (Network Attached Storage) appliance and Symantec’s Backup Exec (System Recovery version).

I was back in the U.S. when this happened and I was able to talk them through the procedure. They were up and running by the end of the day!

Sphere: Related Content

Saturday, July 21, 2007


How to turn it into a competitive advantage.

Business continuity. Disaster recovery. Colocation. Backup service providers. All of these share at least one thing in common and that’s storage.

SAAS. SOA. High availability. Regulatory compliance. Risk management. Virtualization. These topics were steps in a thread that followed this sequence:

We plan to deploy our proprietary applications over the web to serve our user
communities (Software As A Service, or SAAS).

In fact, we should start re-orienting our entire information system to become more agile and flexible (Service-Oriented Architecture or SOA).

If we do that, we’ve got to ensure our user communities have 24x7x365 accessibility (high availability).

At the same time, the law requires us to safeguard our clients’ information from unauthorized access (regulatory compliance).

Now, what are the risks that stand in our way? How can we mitigate most or all of those (risk management)?

A primary way to do that is to create shared pools of computing assets (server and storage virtualization).
I illustrated the actual sequence of thought of a client's CIO. It eventually became the company’s blueprint for the nature and sequence of IT investments. Senior management gave him the mission of ensuring that IT would support the company’s business goals. At that time, the company was a four-year old start-up that specialized in providing backroom services to hospitals.

I became their consultant based on my experience when I worked for the second largest U.S. provider of the same service. We provided the service through WANs. That was in the late-90s. The CIO above planned to provide it through web-based applications. It’s really remarkable how fast things change in eight years.

WANs are Wide-Area Networks. These are networks that connect far-flung branches to the mother ship and, occasionally, directly to each other. WANs are ubiquitous. Walgreens, Wal-mart, and Home Depot are examples of companies that use WANs extensively. You’ll know you’re dealing with a WAN when you can buy from one store and return it at another. Their WANs might not necessarily be working in real-time but here’s one example that does: bank ATMs (or Automatic Teller Machines). “Real-time” means occurring as it happens. Monday night football, for instance, is aired in real-time.
Here’s an overview of these IT objectives. I’ve been on both sides of the fence—as a CIO and as a vendor—so I can present both perspectives.

The IT goal exists to accomplish the Business Objective.

The IT Deliverable will indicate the attainment of the IT Goal.

The IT Goal can be classified according to its nature. It may be an “O,” which stands for Offensive, or a “D,” which stands for Defensive. An O goal indicates the objective confers the organization with a competitive advantage. A D is meant to protect the organization’s assets. It turns out that these are all Ds. Later on, I'll explain how these Ds can become Os.

I only presented the first four issues. They all revolve around storage.

If the IT GOAL is Business Continuity, then the
BUSINESS OBJECTIVE must have been:
to keep the business operating through crises and disasters. While it may not be possible to operate every aspect of the business, all mission-critical aspects must continue to operate. Interruptions should be kept to a minimum. The severity of the impact should be limited. Minimize financial losses!

The IT DELIVERABLE, in that case, would have been:
an infrastructure equipped with the appropriate technology and trained people that are always prepared to provide IT services in the event of crises and disasters. Components of this IT deliverable include the IT goals described below, namely, disaster recovery, colocation, and backup service providers. The organization should be trained and periodically rehearsed.

If the IT GOAL is
Disaster Recovery, then the BUSINESS OBJECTIVE must have been:
to get back up and running as quickly as possible in the event of a major disruption.

The IT DELIVERABLE, in that case, would have been:
a combination of elements that will allow operations to resume in the event of a disaster. Some of these elements are the colocation services and backup service providers discussed below.

If the IT GOAL is
Colocated Services, then the BUSINESS OBJECTIVE must have been:
to provide stand-by redundancy and data protection from a geographically distant location.

The IT DELIVERABLE, in that case, would have been:
a fully-functional IT configuration housed in a facility located some distance from the primary data center. "Colocate" was a term coined to describe the service of renting space in a "hardened" facility for the purpose of housing backup equipment.

If live production data is continuously sent to the backup, it's referred to as a "hot" configuration. If not, it's a "cold" one. Regardless, the purpose of collocated service is to take over in the event of an emergency. A hot site should be able to transition within 24 hours of the start of the incident. A cold one will take a lot longer. The transition time will be measured in days.

If the IT GOAL is
a Remote Backup Service, then the BUSINESS OBJECTIVE must have been:
to safely store data that stays accessible (through the Internet) in the event of a major disruption.

The IT DELIVERABLE, in that case, would have been:
an ongoing subscription to the services of a third-party company. Your data is stored on their equipment and administered by their people. You should pay for the highest level of support with this option. It will take days to restore your data and resume operations.

Think about it. First, you might need
to replace your equipment. Second, you would need to prepare them (format the drives, install the software, etc.). Third, you might need to restore the previous IT environment (user accounts, etc.). Fourth, you need to retrieve your data (on CDs probably). Finally, you can resume operations. This description is overly simplified. I've gone through this for different clients and it's always caused them a huge headache and given us a small windfall.

In one incident involving a metro trucking company, a lightning storm fried most of their desktops and everything in their server room. It took us five calendar days (we worked through a weekend) to receive
the correct hardware and software. We bought replacement servers and desktops and laptops. We had to re-order again because of incorrect or incomplete items. We also had to buy an upgraded version of their proprietary software. We dealt with five different vendors, three of which required immediate payment in full. This, in turn, put additional pressure on the owners. They were already scrambling. They were dealing with insurance, their bank, the landlord, and their staff. They already had to hire extra help since they were using paperwork to run their business.

It took a day and a half to prepare everything. Both steps, after nearly seven days, required two of us. It took another two days to configure their IT environment. We learned that their proprietary software required agents installed in all of its clients. In other words, their trucking industry software required
each and every desktop that used it to be specifically prepared. We were on the phone constantly with them. Our hassles didn't stop there. They had been using an older version of this software so when we installed the current version, our client's analyst had to adapt the stored data to the newer version. So finally, nine days after the incident occurred, we began restoring the data.

I skimmed over many details. For instance, I didn't mention the hotel room we rented. Or the Internet connection that had to be set up (this happened in the late-90s). We moved back into the office on the 14th calendar day and they felt comfortable enough to let us go after the 16th day.

This experience taught us many lessons. One of them is the importance of selecting the appropriate contingency solution.

Everything begins with a formal process of contingency planning. Senior management should support this. It’s highly advisable for the Chief Operating Officer (COO) and the Chief Information Officer (CIO) to become active members of the planning team. In broad strokes, the team should do a comprehensive assessment of the risks the company faces, their likely impact, and the company’s various plans of action.

Contingency planning is important for another reason. The discussion can take a proactive turn if the technical side is able to enlighten the business side of the potential of carefully configuring the business continuity solution.

I’ve seen it happen several times. The light bulb goes on and suddenly the business side gets “it.”

IT being the proposition that a carefully configured solution will enable new processes. These new processes, in turn, can become competitive advantages. In so doing, the Ds suddenly turn into Os that deliver competitive advantages.

I'll explain that in a forthcoming article.

Sphere: Related Content

Friday, July 20, 2007


Best Practices

If I had to distill the best practices of an effective DR and BC program, it would boil down to a short list of four concepts. Each concept directs you to pursue a course of action that contributes to a successful Disaster Recovery (DR) & Business Continuity (BC) program.

These four concepts are:
  • Work out a realistic vision of your organization’s survival objectives and develop your plan based on it. The key word is “realistic.” You need to know your organization's objectives are realistic because your plans need management's support.
  • Review and regularly refine your plans. Be proactive. Your DR and BC plans are meant to be living documents. They’re useless if they sit on the shelf. Review the plan internally as well as through the eyes of experts. Unearth shortcomings. Cover the gaps. America’s leaders don’t think it’s a question of “if” but rather a question of “when” before terrorism strikes at our heartland again.
  • Anticipate and adjust to your environment. Continuously. There are “big” changes like new regulations and new technologies and “small” changes like employee turnover and new phone numbers. Big or small, these changes must find their way into your plans.
  • Practice for the real thing. There’s no substitute for going through the exercise. It’s probably not possible to exercise all or even most of the plan but that shouldn’t stop you from doing parts at a time. Seek senior-level sponsorship especially for this next piece. Exercise your plan with as little advance notice as possible. Disasters don’t usually announce themselves—they just happen, don’t they? Practice benefits you an important way. It exposes your plan’s flaws. Most of the time, you’ll find it’s the people that “betrays” you. Their apparent apathy is behind their lack of preparation. This reminds me of a fire drill years ago at our office on the umpteenth floor of a high-rise. Nobody took the drills seriously—even the “old-timers”—until building management hired a retired fire chief to conduct the drill. In gruff tones and with piercing eyes, he told us how quickly the flames would spread and why we would probably not burn to death. The smoke, he growled, would kill us first.

There you have it. Four common sense concepts:
  1. Identify and fill in a realistic vision of your organization’s survival objectives. What are the goals and what will it take to achieve them?
  2. Review and regularly refine your plans. Don’t wait for a disruptive event to update your plan. Be proactive.
  3. Anticipate and adjust to your environment. Ensure senior management is involved. BC and DR are not IT concerns. They’re business concerns. IT just happens to be the one tasked with the program.
  4. Practice for the real thing. Flush out your deficiencies. You can bet there’ll be many. Hire a fire chief and then begin a regular disaster awareness and training program.
Good luck!

Sphere: Related Content

Tuesday, July 17, 2007


Planning takes four steps

It took a while but business continuity planning (BCP) has finally become visible on the radar screen of managers and owners of smaller businesses (< $100 million sales). It’s about time too. The state of the world today is far more volatile than it was a mere eight years ago. Nine 11 did change everything.

Every organization should plan for its continued existence in the event of a major disruption. How will it continue to operate if its operation—and existence—is disrupted by any number of natural or man-made disasters?

The practice of Business Continuity Planning (BCP) has evolved into a recognized field. Job titles that carry or imply this area now exist. Practitioners can join any number of reputable associations that promote this field. Several recognized certifications can now be earned as well.

I had the good fortune of working as a Sales Systems Engineer for the world’s largest enterprise storage vendor just before the dot com crash. I’m referring to EMC, the 800-pound gorilla of the enterprise storage space. At that time, the basic rationale behind EMC’s fabulously expensive SRDF (Symmetrix Remote Data Facility) was real-time replication for disaster recovery (DR). Under the proper guidance, it can be a short leap from DR to BCP. And that is where SRDF is now positioned—as the lynchpin of the data side of business continuity planning.

The mission of a Systems Engineer who works in Sales is to support his sales reps by designing the storage and DR solutions for customers and prospects alike. To him fell the task of dealing with the technical aspect of any proposal or project. This frequently involved making technical presentations for prospects and serving as the single point-of-contact for existing customers that were contemplating system upgrades.

Disaster recovery (DR) is a subset of the BC solution. Many fine definitions of the term abound so rather than reinvent the wheel, I will quote some of the better ones. Disaster recovery is:

  • the process, policies and procedures of restoring operations that are critical to the resumption of business [Wikipedia].
  • the ability of an organization to respond to a disaster or an interruption in services by implementing a disaster recovery plan to stabilize and restore the organization’s critical functions. [Disaster Recovery Journal].

Wikipedia goes on to say that…

  • a disaster recovery plan (DRP) should include plans for coping with the unexpected or sudden loss of communications and/or key personnel, although these are not covered in this article, the focus of which is data protection. Disaster recovery planning is part of a larger process known as business continuity planning (BCP).

Disaster Recovery Journal continues as well…

  • The management approved document that defines the resources, actions, tasks and data required to manage the technology recovery effort. Usually refers to the technology recovery effort. This is a component of the Business Continuity Management Program.

The two share the common thread in their reference to business continuity planning and its inclusion of disaster recovery within its larger scope.

I will continue this in a subsequent post. For now, let me break down the steps that BCP entails. The process follows these four steps in a logical sequence.


Identify risks and hazards that confront your business. These can be natural hazards, e.g., flooding and earthquake, or man-made risks, e.g., power outage, theft, fire, attack against your computer network. Obviously you have to draw the line at some point since it is impractical to anticipate some risks regardless of their severity. For example, two key project members in an SAP implementation project I participated in literally met an unfortunate and fatal accident. That incident delayed a major portion of the entire project until replacement personnel were hired.


It is possible to quantitatively and qualitatively determine the likelihood, magnitude, and duration of the identified risks. Assessing risks this way allows you to prioritize them. When risks are categorized this way, you can budget your resources more rationally.

Plan Development

You now have the information to create the plans and procedures for preparing your organization to respond to and recover from interruptions. This is a high-level step and as the saying goes, the devil is in the details. This is where senior management, which should have initiated this project to begin with, should return and visibly support the BCP team. The team will need the time to extensively discuss the risks and possible solutions with functional heads. Without that support, the team will find it difficult to get the attention of the functional heads, much less their full-hearted cooperation.


In this final step you must exercise the plan. This is the only way to learn what works and what does not. Needless to say, this is another step that senior management must support. Exercising the plan is a continuing activity. In fact, this entire process is performed iteratively. Exercising the BC plans will refine those plans and, more importantly, teach the employees how to respond if and when the real event happens.

Sphere: Related Content

Saturday, July 14, 2007


A best practice that should be used for selecting a business continuity solution.

IT managers have a plethora of solutions to choose from depending upon technical factors, business objectives, and, of course, available budget.

Providing continuous business operations is daunting. This is one of those tasks that have become more difficult because the number and kind of choices have multiplied. When I was growing up, it wasn’t difficult to brush my teeth. I would just pick up my toothbrush and go. When my kids were growing up, it was because they each had at least four toothbrushes to pick from.

“Which color tonight?” “Elmo or Cookie Monster?” Similarly, “IBM or EMC?”

There’s a configuration available for every budget and level of protection. Interoperability, or the product’s capability of working with other manufacturer’s products, is less of a concern today since practically all products—hardware or software—are interoperable. On the one hand, that’s good but on the other, it further complicates the decision tree.

“We’re an IBM shop; should we stay with IBM?” “What about our branch offices? Will a NetApp-Symantec solution suffice?”

When I’m grocery shopping, I don’t need assistance in deciding which coffee to buy. I have my regular brand and if a competing brand is on sale I know enough to decide whether to try the one on sale or not. That’s not the situation here. The “correct” solution is a decision that my organization will have to live with for years to come.

This is where I use the formal contract management process. Making sense of the buying complexity led me to learn and use it. Procurement and contract management is a subject in itself. I'll cover that in a future article.

I use procurement and contract management as a best practice. I use it since this is not a cursory purchase. It’s not wise to simply request any three or four vendors to make their presentations and submit proposals.

Procurement and contract management has three major phases. I call the first the Pre-award Phase (yes, I know it’s imaginative). For me, the buyer, I have to:

1. Plan the procurement.

2. Plan the solicitations.

3. Request the solicitations.


Procurement planning, the first activity, determines what to procure and when. It includes the people aspect. Do I plan to hire or train existing employees?


Solicitation planning, the second activity, fills in the details. I develop a Statement of Work (SOW) that contains my requirements. This is more difficult than it appears. I have to understand my own requirements and understanding those demand purposeful and far ranging analysis. Then I need to communicate those requirements using specific deliverables in the SOW. This purchase is inevitably going to be delivered through a project process. It’s very unlike a server equipment purchase. The solution will be delivered through a project because it requires the vendor to coordinate closely with the client to plan, configure, before the cut over.


The third activity is requesting solicitations through a Request for Proposal (RFP). This is where my thoroughness in the first two activities—procurement planning and solicitation planning—pays off. I’ll be creating an RFP that clearly communicates my needs. I must provide an accurate well-defined RFP in order to receive useful bids. By useful, I refer to bids that meet most of my requirements. Let’s face it. Before I settle on a vendor, I’ll sit down with the finalists to review, clarify, and negotiate the final contract. If I wrote a poor quality RFP, I’ll receive poor quality bids. And I’ll have more work, and consume more time, when I sit down with the finalists. I’ll need to re-define my requirements and then request the vendors that are still interested to submit bids again.

Sphere: Related Content

Wednesday, July 11, 2007


This is not your father's cryptography!

Traditional cryptography systems work under the condition that both the sender and recipient each have a copy of the cipher key. The sender sends an email that was encrypted using his copy of the cipher key. Once received, the recipient decrypts it using her copy of the same cipher key. This system works because each party has a copy of the same cipher key.

But what if the recipient did not have a copy of the cipher key?


Cryptographers came up with several ingenious solutions. The one most widely used employs a public and a private key. Only the recipient needs to create these keys. It works like this:
  • The recipient sends her public key to the sender.
  • The sender uses her public key to encrypt the message and then sends the encrypted message to her.
  • The recipient uses her private key to decrypt the message.
Notice the following:
  • The public key is given to anybody who wants to send encrypted mail to the recipient.
  • The private key is the only key that can decrypt any message that was encrypted with its partner public key.
This method, called Public-Key Encryption, has one major weakness. That weakness arises from the public key. A third party could create its own private & public pair and send its public key to the sender. The public key is sent under the guise that it came from the recipient. If the sender uses the bogus public key and sends the encrypted message and that message is intercepted by the third party, the latter will be able to decrypt the message using the private key it generated.

This weakness can be avoided in two ways:
  • the public key is personally handed to the sender; or
  • have an independent and trusted third party, called a Certificate Authority (CA), authenticate the recipient's identity and, thus, the authenticity of her public key. Authentication is done through a digital certificate.
The recipient and the CA are the two primary parts that form what is called the Public Key Infrastructure (PKI). Numerous articles explain this so you may click here to read an explanation of the process.


What about casual users? Is there a simpler way to exchange encrypted emails? There is and it's as simple as sending the cipher key to the recipient in one email and then following it up with a second email that contains the encrypted message.

This is a weak system but it's adequate for casual users.


I had a discussion about cryptography with an information security specialist recently. He described a method that uses private keys and doesn't involve a third party. He had no name for it but it works! Each party has a private key that never leaves the owner's possession.

The method solves the conundrum (raised at the beginning of this article) of sending an encrypted message to a recipient who doesn't have a copy of the cipher key.
First, the sender encrypts his message using his private key. He then sends it.

Second, the receipient receives his encrypted message and then proceeds to encrypt it again using her private key. She then sends the double-encrypted message back to him.

Third, the sender decrypts his message. His message is now in clear text. He sends it back to her.

Finally, she decrypts the email using her private key and reads the clear text message.
Take note of the following. Each party has a private key. And there is no third party. I can't see any discernible flaws, can you?

It does require one thing and that's cryptographic software that can encrypt and decrypt a message that's already been encrypted without damaging it. I have not had the opportunity to investigate and find one yet. If you know of any, please contact me.
Sphere: Related Content

Friday, July 6, 2007


A key factor in a system's success or failure.

In many ways, the human-machine interface is the most important part of any computer system. Users will invariably judge their experience with the system through the ease and convenience of using the interface. A good user experience goes a long way towards user acceptance of the system. Acceptance means users will use the system more. The more they use it, the more trust and confidence in the system grows. Soon enough, users will rely on it. Contrast that with an unwieldy and poorly designed interface. The backside of this second system might be more powerful than the first but if its interface is unwieldy and poorly designed and most users have a negative experience, it will be difficult for the system to be accepted.

User acceptance determines how useful and, thereby, successful, the computer system is. So important is the interface that user will usually forgive the system if it is unable to provide all the information that the user needs. Their likely reaction would be to provide suggestions to improve the system.

Perception becomes reality in this instance. User perception determines whether they will accept the system and use it. Usage is one metric to judge how successful an information system is.

Users perform their tasks through the interface. Take the following example. The user has been instructed to:
  1. Search for data – “locate all credit sales last month”
  2. Classify the data – “classify these sales by store location”
  3. Organize it – “arrange each sale transaction by date, starting with the most recent to the oldest”
  4. Process it – “compute the total credit sale per store last month”
  5. Output it – “print out the results but only show the summary totals”
Accomplishing each task requires a series of keystrokes or mouse clicks. For example, to locate all credit sales last month, the user must be able to instruct the system to limit the search to credit sales only that occurred last month only. Depending upon the interface, the user might need to click on a button that specifies credit sales and also enter the dates manually in two fields that define a range of dates. Can you see how the interface design can make a significant difference in the user experience?

It takes forethought to design a good interface. Even then, especially during the first few versions, designers inevitably have to return and tweak the design. This will happen repeatedly. Interface design, therefore, is an iterative process.

Interface design assumes greater importance in a larger and more complex system. Good examples of a well-designed interface in a complex system are most of the user screens in an SAP R/3 system.

SAP R/3 is currently the most popular enterprise information system in the world. It has a modular design. Each module is an application. Modules might consist of a vertical application specific to an industry—the retail sector, for instance. Or, they may consist of horizontal applications that process a function—financial accounting, for example. All businesses—regardless of industry—require financial accounting.

A major factor behind the software’s acceptance (and, therefore, success) is its well-designed application windows (called “sessions” in SAP lingo). For such a complex system, I marvel at its windows. Practically all of its windows are consistently and logically defined. Most tend to be used intuitively. I have heard that in some cases, even someone new to operating a computer has been able to understand most SAP windows within several weeks.

The human-machine interface has evolved into a science. IBM, for example, has a Redbook (their term for a book compilation of technical information). Clicking here will bring up an IBM webpage that discusses the subject of the human-machine interface. Although it applies to web-user interfaces, the general principles remain the same. This Redbook states those principles succinctly:

  1. Is it just ease of use?
  2. What about user efficiency?
  3. Does the User Interface (UI) look like something that would be enjoyable to use?
You can judge the quality of a UI with a handful of factors:
  1. Ease of learning and memorability
  2. Efficiency of use
  3. Error frequency, severity, and recovery
  4. Subjective satisfaction
Now look at an ideal UI. It would:
  1. Be easy to learn
  2. Have a high maximum efficiency of use
  3. Have low error frequency
  4. Deliver subjective satisfaction to the user
I hope this helped you understand the importance of a software component that is frequently given cursory attention (although I notice it is changing for the better, i.e., developers are paying more attention to designing user-friendly interfaces). The interface can determine whether an information system is perceived to be successful or not.

Sphere: Related Content

Wednesday, July 4, 2007


A guide to making an appropriate business decision on an important investment.

In an earlier post, I discussed how and how important it is for Data Centers to be classified. Click here to read it. A new tab or window will open.

In this post, I'll discuss criteria for determining tier selection. I hope you find it useful. As always, I welcome your feedback.

Tier-1 and –2 are typically built to meet short-term requirements. Cost and speed of implementation override uptime (i.e., availability) and life cycle requirements.

Tier-3 and –4 are strategic investments that emphasize uptime and long-term viability. These centers have a much longer useful lifetime than its end-user equipment. These centers liberate the company to make strategic business decisions concerning growth and technology. A transportation company can expand its operations across the country knowing that every regional office it establishes is backed up by its Tier-4 infrastructure.

Tier-1 is appropriate for:
  1. firms where IT only enhances internal operations. The firm can continue to run for an extended period without IT presence
  2. businesses that don’t anticipate a severe financial impact from prolonged downtime
  3. companies that plan to abandon the center when their IT requirements increase.
Tier-2 is appropriate for:
  1. Internet Service Providers (ISPs) that don’t guarantee their clients a high uptime rate in their Service Level Agreement (SLA)
  2. firms whose IT requirements are mostly limited to standard business hours, e.g., Monday to Friday from 8 to 5. The Data Center can schedule its maintenance schedule around these hours.
  3. institutional or educational organizations that won’t suffer meaningful impact (libraries or schools)businesses on a tight budget that want to store their data off-site (electronic vaulting). A smart strategy for them is to plan to take their chances with Tier-2 only temporarily. They should plan and budget to switch after a planned and limited duration (months instead of years) firms that plan to abandon the center when their IT requirements increase.

Tier-3 is appropriate for:
  1. companies that require IT services to support mission-critical processes and can tolerate short (less than 12 to 18 hours) outages (e.g., hospitals)
  2. firms that have high-availability requirements and are willing to accept the financial impact of unexpected downtime
  3. companies that designed their Tier-3 sites to be upgraded to Tier-4.
Tier-4 is justified for:
  1. large companies in highly competitive industries
  2. organizations that require 24/7 uptime due to laws and regulations (e.g., banks and financial services)
  3. internet-based businesses that derive their revenue from e-commerce 24/7

Reference: The Uptime Institute

Sphere: Related Content

Tuesday, July 3, 2007


The general attributes of each Data Center Tier are presented below.

This Part-2. Click here to read Part-1. Click here to read Part-3. A new tab or window will open for each post.

Data Centers are classified into four tiers. Tier-1 refers to a basic facility and Tier-4, to the most reliable and sophisticated type. This post goes into further detail about each tier.


  1. takes 15 to 20 months to plan and implement
  2. is the most expensive type and most costly to operate
  3. is housed in a stand-alone building
  4. is staffed "24 x 7 x forever"
  5. intentionally uses only 90% or less of its total load capacity.
  6. has at least two active distribution paths for connectivity, power, and cooling
  7. All paths are physically separated and always active. The failure of any single active path will not impact uptime.
  8. All components are physically separated. The failure of any single subsystem will not impact uptime. All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches.
  9. Preventive maintenance can be safely done without disrupting operations. Maintenance on any and every system or component can be performed using backup components and distribution paths. The failure of key nexus points will not impact uptime.
  10. A Tier-4 site has a fault-tolerant infrastructure. The site location is not susceptible to any single major disruption. This extends the capability of the lower tier through the addition of measures that will prevent disruption even when crucial components unexpectedly fail. Tier-3 only allows the preventive maintenance of crucial components and has no safety provision for the unexpected failure of crucial components.
  11. All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches. Dual-power technology requires two completely independent systems that feed power via two paths. Research has determined that 98% of all failures occur between the UPS and the computer load.
  1. is housed in a stand-alone building
  2. takes 15 to 20 months to plan and implement
  3. is typically staffed for two shifts or more intentionally
  4. uses only 90% or less of its total load capacity
  5. has at least two paths for connectivity, power, and cooling distribution.
  6. All paths are physically separated. However, only one path is active at any time. The unexpected failure of an active path will impact uptime.
  7. All components are physically separated. The failure of any single subsystem will not impact uptime. All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches.
  8. Preventive maintenance can be safely done without disrupting operations. Maintenance on any and every subsystem or component can be performed using backup components and distribution paths.
  9. This has a concurrently maintainable infrastructure. The site location is not susceptible to unexpected minor disruptions. This extends the capability of the lower tier through the creation of a second distribution path for connectivity, power, and cooling.

  1. may be housed in a wing or floor of an existing building
  2. takes three to six months to plan and implement
  3. is typically staffed for one shift
  4. has only one path for power and cooling; may have a second path for connectivity
  5. has a backup set of only critical power and cooling components, e.g., extra UPS batteries, cooling units, chillers, pumps, and engine generators
  6. The unexpected failure of any component or path will impact uptime.
  7. Operational errors will likely cause a disruption.
  8. The site location is susceptible to all kinds of disruptions. The infrastructure must be shut down to safely perform preventive maintenance.
  1. may be housed in a room or wing of an existing building
  2. typically takes less than three months to plan and implement
  3. is not staffed
  4. has only one path for connectivity, power, and cooling
  5. The unexpected failure of any component or path will impact uptime.
  6. Operational errors will cause a disruption.
  7. The site location is susceptible to all kinds of disruptions. The facility must be shut down to safely perform preventive maintenance.
Despite its basic infrastructure, a Tier-1 center still provides a better IT environment because:
  1. It offers dedicated space.
  2. Its online UPS system does a better job than a standby UPS at filtering power spikes, compensating for sags, and covering momentary outages.
  3. It has nonstop, dedicated cooling equipment.
  4. It has an engine generator to withstand extended power outages.

Sphere: Related Content

Sunday, July 1, 2007


Low-risk. High-reward.

You’re an organization that’s lost count of your desktops, servers, and other IT equipment. Sounds familiar? Most organizations are in that situation.

This is a low-risk, high-reward activity. Be legally compliant. Most software publishers do not litigate. Instead, many use the results of auditing reviews as a basis for “true-up” deals. “True-up” refers to the process of buying more licenses.

Two examples illustrate:

  1. You have 150 licenses of software X. The audit reveals that 180 licenses were deployed and that all 180 are being used simultaneously. You will be required to pay for 180 licenses. This is true-up.
  2. You purchased 300 licenses of software Y. You installed it on 300 desktops. The audit reveals that only 250 instances of software Y are being used. Come renewal time, you pay for only 250 licenses. Let’s refer to this as “true-down.”

The auditing process can collect more useful information. And you should take advantage of that. Since you’re checking every desktop and server (and other IT equipment, but we won’t include that here) anyway, you might as well gather the additional information.

Learn the configuration of each machine.
  • Desktop 56 runs the G/L of the accounting module of SAP Business All-in-One on Windows 2000.
Identify the user of each machine and confirm the appropriateness of that role to the machine.
  • Desktop 56 is assigned to Tom, a cost accountant.
Determine whether the correct software is installed on a particular machine.
  • You discover the accounting module would run faster if more RAM was added to Desktop 56.

While it is true that many users use unlicensed software, a good number of them do so unwittingly. How does this come about?

Confusion that arise from vague, complex, and ever-changing licensing rules.
Software publishers frequently change user licenses. About half the time, they do it during the active life of the product. Case in point: Microsoft. It changed significant parts of its Client Access License (CAL) three times during the three years of Microsoft Windows 2000’s marketing life.
Changes in the user IT environment.
In the data center, servers are inevitably upgraded to newer, more powerful models. Software licenses recognize this and permit software to be installed in the replacement. The process isn’t complete, however, until the same software is removed from the old server that was replaced. In many instances, this part is overlooked. Result: one licensed and one unlicensed deployment.
Mergers & Acquisitions.
It may surprise you but this isn’t a subset of the preceding reason. Why? Many software licenses do not automatically transfer licensee rights to another party unless it’s stated explicitly. More often than not, after one company acquires another, the acquirer takes control over the assets of the acquired. In theory, the acquirer has the responsibility of checking this provision. In reality, lawyers on both sides are busy dealing with other larger issues.
Misunderstanding between IT and Procurement.
This is related to the first reason, namely the confusion that arise from vague, complex, and ever-changing licensing rules. In theory, either IT or Procurement should know how many and what kind of licenses should be acquired. In reality, this often falls between the cracks. Result: under- or over-purchases of appropriate or inappropriate licenses. Two examples: (1) a license is deployed on a server that has more CPUs than the license allows, and (2) widespread access is allowed for software that has a limited-user license.

Is non-compliance a serious problem?

It is. Most software publishers deal with offenders—especially first-timers—in an understanding and lenient manner. Publishers realize that they can lose customers and antagonize entire user groups if they act with a heavy hand. To be fair, publishers deserve the revenue from unlicensed deployments. Bottom line: I think the relative laxity stems from practical reasons of customer relations as well as the recognition by the software industry of the vague, complex, and ever-changing rules of their products. Result: many publishers will settle for true-up deals.

Sphere: Related Content