Showing posts with label cooling. Show all posts
Showing posts with label cooling. Show all posts

Sunday, September 2, 2007

DATA CENTERS, PART-3

General remarks about Tier classifications

This photo was taken at one of Google’s data centers.

This is Part-3 about the subject of Tier classification. For Part-4, click here. A new tab or window will open.

To rewind to Part-1, click here. A new tab or window will open.

Would you like to learn the business factors that should be considered in selecting a Tier? Click here to read it. A new tab or window will open.

The tier of a Data Center is determined by the rating of its weakest system. For example, a center with a Tier-4 power configuration that has a Tier-3 cooling subsystem will yield a Tier-3 classification.

The center’s rank is always the lowest of its individual subsystems. It is not the average of the rank of its individual subsystems.

The MTBF (Mean Time Between Failure) of a center’s subsystems is irrelevant in determining the center’s tier. So is the number of components or systems. Here are two examples. The first example compares how a backup set of duplicate chillers can be added to a site. The backup set can be:
  • arranged along the same cooling distribution paths, or
  • arranged along a second, independent path.
The first arrangement fulfills a Tier-2 center’s criteria. The second arrangement fulfills a Tier-3 center’s criteria.

The second example compares how two separate in-line UPS batteries are controlled. The batteries can be controlled by:
  • a common input or output switch, or
  • separate independent switches.
Once again, the first arrangement fulfills a Tier-2 center’s criteria and the second arrangement fulfills a Tier-3 center’s criteria. UPS configurations that share the same input or output switch gear almost always require the server room to shut down for routine maintenance. In addition, a common switch creates another single point-of-failure. A Tier-2 center has a duplicate set of critical electrical and cooling equipment. A Tier-3 center has a duplicate set of all components and equipment that supports IT operations. The connectivity, electrical, and mechanical systems, for example, must be in duplicate.

Sites that will be built from the ground up should be designed for a future higher tier level, or at the very least, designed to anticipate future power requirements. The owner should take advantage of the relatively small cost difference between a Tier-3 and -4 infrastructure before the facility is built.

A Tier-4 Data Center can be summarily described as being fault-tolerant and concurrently maintainable. A Tier-4 Data Center will theoretically never go down regardless of the failure of any of its subsystems.

Personnel operations—the main factor in determining sustainability—play the biggest role in the uptime of a Data Center. For that reason tier ranking can only be performed objectively through the center’s topology, architecture, and components. Sustainability factors directly or indirectly account for 70% of all downtime. The performance of individual data centers within the same tier largely depend upon sustainability. The correct implementation of sustainability factors decrease the cost and risk of completing maintenance or hasten the recovery of the center from disruptions.

THE IMPORTANCE OF SUSTAINABILITY

Sustainability largely determines the uptime performance of individual data centers within the same tier. Sustainability factors make the difference between an easy maintenance procedure or a difficult one; between an inexpensive or costly one; and a convenient or awkward one. A difficult, costly, or awkward maintenance procedure increases the chance that it will be delayed or skipped. And missed maintenance increases the chance of equipment or component failure. This section provides examples of infrastructure characteristics that impact sustainability. These characteristics are details of design, IT architecture, or implementation.
  1. The ability to switch the power source of all mechanical components so they continue running before starting maintenance work on an electrical panel.
  2. The placement of a critical component in a cramped area when it could have been placed elsewhere.
  3. The placement of engine power generators and switching gear inside the facility instead of outside will eliminate the effects of weather.
  4. The decision to limit the aggregate load on any subsystem to 90% of rated capacity instead of 100% will improve stability and prolong equipment life.
  5. Compartmentalization refers to the physical separation of the primary and secondary paths. Tier-4 sites have compartmentalized subsystems. Personnel can attack a fire in the primary path’s area if it's physically separated from the secondary path.

Sphere: Related Content

Saturday, June 30, 2007

DATA CENTERS, PART-1

Understanding Tier Classifications

When mission-critical applications fail, so can its owner. Every measure should be taken to try to prevent this from happening. Protecting the technology that run the applications is one of the first things that must be done. This task is made much easier when the technology is housed together. Housing them in a secure environment is the function of a data center. Servers, storage devices, networking gear, and the people who keep them running operate out of these facilities.

A data center, to sum it up, is the physical home of the IT capabilities of organizations.

BACKGROUND

The Uptime Institute, an independent association, developed the tier classification of data centers. There are four tiers. Tier-1 refers to a basic facility and Tier-4, to the most reliable and sophisticated type. Institute certification is recognized as the industry standard. Anyone can claim Tier-4 status but unless it came from The Uptime Institute, it should be viewed with skepticism.

The institute grants a data center its tier classification only after a rigorous evaluation of the facility’s design and sustainability. That the institute is a third-party and is the body that developed the standards give its determination a credibility that self-proclaimed claims just don’t have. Institute certification provides an objective basis for judging the capabilities of a data center.

In my experience, this is important. Between 2001 and 2003, I helped clients colocate at a large local data center that advertised its Tier-1 classification.
This was the former Exodus data center in Elk Grove Village, Illinois. Exodus went bankrupt in Q3 of 2001. Cable & Wireless USA bought it in Q1 of 2002. C&W, in turn, also went bankrupt and sold it to Savvis in Q1 of 2004. Savvis, to my knowledge, still owns it.

OVERVIEW OF TIERs

The institute’s summary of the high-level characteristics of each tier is presented below.

Tier-1
  1. Has a single path for power and cooling distribution
  2. Has redundant components
  3. And has a mean uptime availability of 99.671% (equivalent to 29 hours of downtime a year)

Tier-2
  1. Has a single path for power and cooling distribution
  2. Has redundant components
  3. And has a mean uptime availability of 99.749% (equivalent to 22 hours of downtime a year)
Tier-3
  1. Has multiple paths for power and cooling distribution but only path is active at any given time
  2. Has redundant components that make it concurrently maintainable
  3. And has a mean uptime availability of 99.982% (equivalent to 1.6 hours of downtime a year)
Tier-4
  1. Has multiple paths for power and cooling distribution that are all always active at any given time
  2. Has redundant components that make it fault-tolerant and concurrently maintainable
  3. And has a mean uptime availability of 99.991% (equivalent to about 13 minutes of downtime a year)

Note how small the improvement in uptime increases from Tier-1 to Tier-4. Meanwhile, the investment required to become Tier-4 is many times greater than Tier-1. In short, moving from 99.671% to 99.991% costs a disproportionate amount of dollars. Is it worth it? That’s the question for many data centers: is it? And I suppose the answer depends upon the customers that’ll use it.

DEFINITIONS
  • Concurrent maintainability refers to the capability of being able to perform all scheduled work without adversely impacting the end-user.
  • Fault-tolerance is the capability to sustain a worst case, unplanned event without adversely impacting uptime. Two major requirements for achieving fault-tolerance are redundant equipment and multiple active paths.
  • Single points-of-failure refers to the location or equipment that will bring the entire system down (downtime) if that location or equipment fails. Tier-1 and –2 have many single points-of-failure. Tier-3 has several. And Tier-4 is supposed to have none.
  • Site Infrastructure refers to the data center taken as a whole. A typical data center has at least 20 major mechanical, electrical, fire protection, security, HVAC, and other systems.
  • Sustainability refers to the ease, convenience, and cost of operating the Data Center. A well-designed site will cost less to operate and be easier to maintain. As a group, sustainability factors account for 70% of all infrastructure failures. Human decisions and activities primarily account for sustainability factors. Two-thirds of all failures result from management errors. The remainder arise from errors made by operations staff.
  • Useable capacity refers to the maximum load that the center’s systems can support. This is less than the non-redundant capacity since allowance must be made for aging components, installation errors, and the size of the desired buffer to accommodate surges in demand. Tier-3 and -4 sites are typically the ones that limit their total load to 90% of the aggregate capacity
RELATED POSTS

This precedes two posts about the general attributes of Tiers. Click here to read Part-2 and here to read Part-3. A new tab or window will open for each post.

This also precedes a post about the business factors that should be considered in selecting a Tier. Click here to read it. A new tab or window will open.


Sphere: Related Content