Showing posts with label tier. Show all posts
Showing posts with label tier. Show all posts

Sunday, September 2, 2007

DATA CENTERS, PART-3

General remarks about Tier classifications

This photo was taken at one of Google’s data centers.

This is Part-3 about the subject of Tier classification. For Part-4, click here. A new tab or window will open.

To rewind to Part-1, click here. A new tab or window will open.

Would you like to learn the business factors that should be considered in selecting a Tier? Click here to read it. A new tab or window will open.

The tier of a Data Center is determined by the rating of its weakest system. For example, a center with a Tier-4 power configuration that has a Tier-3 cooling subsystem will yield a Tier-3 classification.

The center’s rank is always the lowest of its individual subsystems. It is not the average of the rank of its individual subsystems.

The MTBF (Mean Time Between Failure) of a center’s subsystems is irrelevant in determining the center’s tier. So is the number of components or systems. Here are two examples. The first example compares how a backup set of duplicate chillers can be added to a site. The backup set can be:
  • arranged along the same cooling distribution paths, or
  • arranged along a second, independent path.
The first arrangement fulfills a Tier-2 center’s criteria. The second arrangement fulfills a Tier-3 center’s criteria.

The second example compares how two separate in-line UPS batteries are controlled. The batteries can be controlled by:
  • a common input or output switch, or
  • separate independent switches.
Once again, the first arrangement fulfills a Tier-2 center’s criteria and the second arrangement fulfills a Tier-3 center’s criteria. UPS configurations that share the same input or output switch gear almost always require the server room to shut down for routine maintenance. In addition, a common switch creates another single point-of-failure. A Tier-2 center has a duplicate set of critical electrical and cooling equipment. A Tier-3 center has a duplicate set of all components and equipment that supports IT operations. The connectivity, electrical, and mechanical systems, for example, must be in duplicate.

Sites that will be built from the ground up should be designed for a future higher tier level, or at the very least, designed to anticipate future power requirements. The owner should take advantage of the relatively small cost difference between a Tier-3 and -4 infrastructure before the facility is built.

A Tier-4 Data Center can be summarily described as being fault-tolerant and concurrently maintainable. A Tier-4 Data Center will theoretically never go down regardless of the failure of any of its subsystems.

Personnel operations—the main factor in determining sustainability—play the biggest role in the uptime of a Data Center. For that reason tier ranking can only be performed objectively through the center’s topology, architecture, and components. Sustainability factors directly or indirectly account for 70% of all downtime. The performance of individual data centers within the same tier largely depend upon sustainability. The correct implementation of sustainability factors decrease the cost and risk of completing maintenance or hasten the recovery of the center from disruptions.

THE IMPORTANCE OF SUSTAINABILITY

Sustainability largely determines the uptime performance of individual data centers within the same tier. Sustainability factors make the difference between an easy maintenance procedure or a difficult one; between an inexpensive or costly one; and a convenient or awkward one. A difficult, costly, or awkward maintenance procedure increases the chance that it will be delayed or skipped. And missed maintenance increases the chance of equipment or component failure. This section provides examples of infrastructure characteristics that impact sustainability. These characteristics are details of design, IT architecture, or implementation.
  1. The ability to switch the power source of all mechanical components so they continue running before starting maintenance work on an electrical panel.
  2. The placement of a critical component in a cramped area when it could have been placed elsewhere.
  3. The placement of engine power generators and switching gear inside the facility instead of outside will eliminate the effects of weather.
  4. The decision to limit the aggregate load on any subsystem to 90% of rated capacity instead of 100% will improve stability and prolong equipment life.
  5. Compartmentalization refers to the physical separation of the primary and secondary paths. Tier-4 sites have compartmentalized subsystems. Personnel can attack a fire in the primary path’s area if it's physically separated from the secondary path.

Sphere: Related Content

Tuesday, July 3, 2007

DATA CENTERS, PART 2

The general attributes of each Data Center Tier are presented below.


This Part-2. Click here to read Part-1. Click here to read Part-3. A new tab or window will open for each post.

Data Centers are classified into four tiers. Tier-1 refers to a basic facility and Tier-4, to the most reliable and sophisticated type. This post goes into further detail about each tier.

Tier-4

  1. takes 15 to 20 months to plan and implement
  2. is the most expensive type and most costly to operate
  3. is housed in a stand-alone building
  4. is staffed "24 x 7 x forever"
  5. intentionally uses only 90% or less of its total load capacity.
  6. has at least two active distribution paths for connectivity, power, and cooling
  7. All paths are physically separated and always active. The failure of any single active path will not impact uptime.
  8. All components are physically separated. The failure of any single subsystem will not impact uptime. All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches.
  9. Preventive maintenance can be safely done without disrupting operations. Maintenance on any and every system or component can be performed using backup components and distribution paths. The failure of key nexus points will not impact uptime.
  10. A Tier-4 site has a fault-tolerant infrastructure. The site location is not susceptible to any single major disruption. This extends the capability of the lower tier through the addition of measures that will prevent disruption even when crucial components unexpectedly fail. Tier-3 only allows the preventive maintenance of crucial components and has no safety provision for the unexpected failure of crucial components.
  11. All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches. Dual-power technology requires two completely independent systems that feed power via two paths. Research has determined that 98% of all failures occur between the UPS and the computer load.
Tier-3
  1. is housed in a stand-alone building
  2. takes 15 to 20 months to plan and implement
  3. is typically staffed for two shifts or more intentionally
  4. uses only 90% or less of its total load capacity
  5. has at least two paths for connectivity, power, and cooling distribution.
  6. All paths are physically separated. However, only one path is active at any time. The unexpected failure of an active path will impact uptime.
  7. All components are physically separated. The failure of any single subsystem will not impact uptime. All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches.
  8. Preventive maintenance can be safely done without disrupting operations. Maintenance on any and every subsystem or component can be performed using backup components and distribution paths.
  9. This has a concurrently maintainable infrastructure. The site location is not susceptible to unexpected minor disruptions. This extends the capability of the lower tier through the creation of a second distribution path for connectivity, power, and cooling.


















Tier-2
  1. may be housed in a wing or floor of an existing building
  2. takes three to six months to plan and implement
  3. is typically staffed for one shift
  4. has only one path for power and cooling; may have a second path for connectivity
  5. has a backup set of only critical power and cooling components, e.g., extra UPS batteries, cooling units, chillers, pumps, and engine generators
  6. The unexpected failure of any component or path will impact uptime.
  7. Operational errors will likely cause a disruption.
  8. The site location is susceptible to all kinds of disruptions. The infrastructure must be shut down to safely perform preventive maintenance.
Tier-1
  1. may be housed in a room or wing of an existing building
  2. typically takes less than three months to plan and implement
  3. is not staffed
  4. has only one path for connectivity, power, and cooling
  5. The unexpected failure of any component or path will impact uptime.
  6. Operational errors will cause a disruption.
  7. The site location is susceptible to all kinds of disruptions. The facility must be shut down to safely perform preventive maintenance.
Despite its basic infrastructure, a Tier-1 center still provides a better IT environment because:
  1. It offers dedicated space.
  2. Its online UPS system does a better job than a standby UPS at filtering power spikes, compensating for sags, and covering momentary outages.
  3. It has nonstop, dedicated cooling equipment.
  4. It has an engine generator to withstand extended power outages.


Sphere: Related Content

Saturday, June 30, 2007

DATA CENTERS, PART-1

Understanding Tier Classifications

When mission-critical applications fail, so can its owner. Every measure should be taken to try to prevent this from happening. Protecting the technology that run the applications is one of the first things that must be done. This task is made much easier when the technology is housed together. Housing them in a secure environment is the function of a data center. Servers, storage devices, networking gear, and the people who keep them running operate out of these facilities.

A data center, to sum it up, is the physical home of the IT capabilities of organizations.

BACKGROUND

The Uptime Institute, an independent association, developed the tier classification of data centers. There are four tiers. Tier-1 refers to a basic facility and Tier-4, to the most reliable and sophisticated type. Institute certification is recognized as the industry standard. Anyone can claim Tier-4 status but unless it came from The Uptime Institute, it should be viewed with skepticism.

The institute grants a data center its tier classification only after a rigorous evaluation of the facility’s design and sustainability. That the institute is a third-party and is the body that developed the standards give its determination a credibility that self-proclaimed claims just don’t have. Institute certification provides an objective basis for judging the capabilities of a data center.

In my experience, this is important. Between 2001 and 2003, I helped clients colocate at a large local data center that advertised its Tier-1 classification.
This was the former Exodus data center in Elk Grove Village, Illinois. Exodus went bankrupt in Q3 of 2001. Cable & Wireless USA bought it in Q1 of 2002. C&W, in turn, also went bankrupt and sold it to Savvis in Q1 of 2004. Savvis, to my knowledge, still owns it.

OVERVIEW OF TIERs

The institute’s summary of the high-level characteristics of each tier is presented below.

Tier-1
  1. Has a single path for power and cooling distribution
  2. Has redundant components
  3. And has a mean uptime availability of 99.671% (equivalent to 29 hours of downtime a year)

Tier-2
  1. Has a single path for power and cooling distribution
  2. Has redundant components
  3. And has a mean uptime availability of 99.749% (equivalent to 22 hours of downtime a year)
Tier-3
  1. Has multiple paths for power and cooling distribution but only path is active at any given time
  2. Has redundant components that make it concurrently maintainable
  3. And has a mean uptime availability of 99.982% (equivalent to 1.6 hours of downtime a year)
Tier-4
  1. Has multiple paths for power and cooling distribution that are all always active at any given time
  2. Has redundant components that make it fault-tolerant and concurrently maintainable
  3. And has a mean uptime availability of 99.991% (equivalent to about 13 minutes of downtime a year)

Note how small the improvement in uptime increases from Tier-1 to Tier-4. Meanwhile, the investment required to become Tier-4 is many times greater than Tier-1. In short, moving from 99.671% to 99.991% costs a disproportionate amount of dollars. Is it worth it? That’s the question for many data centers: is it? And I suppose the answer depends upon the customers that’ll use it.

DEFINITIONS
  • Concurrent maintainability refers to the capability of being able to perform all scheduled work without adversely impacting the end-user.
  • Fault-tolerance is the capability to sustain a worst case, unplanned event without adversely impacting uptime. Two major requirements for achieving fault-tolerance are redundant equipment and multiple active paths.
  • Single points-of-failure refers to the location or equipment that will bring the entire system down (downtime) if that location or equipment fails. Tier-1 and –2 have many single points-of-failure. Tier-3 has several. And Tier-4 is supposed to have none.
  • Site Infrastructure refers to the data center taken as a whole. A typical data center has at least 20 major mechanical, electrical, fire protection, security, HVAC, and other systems.
  • Sustainability refers to the ease, convenience, and cost of operating the Data Center. A well-designed site will cost less to operate and be easier to maintain. As a group, sustainability factors account for 70% of all infrastructure failures. Human decisions and activities primarily account for sustainability factors. Two-thirds of all failures result from management errors. The remainder arise from errors made by operations staff.
  • Useable capacity refers to the maximum load that the center’s systems can support. This is less than the non-redundant capacity since allowance must be made for aging components, installation errors, and the size of the desired buffer to accommodate surges in demand. Tier-3 and -4 sites are typically the ones that limit their total load to 90% of the aggregate capacity
RELATED POSTS

This precedes two posts about the general attributes of Tiers. Click here to read Part-2 and here to read Part-3. A new tab or window will open for each post.

This also precedes a post about the business factors that should be considered in selecting a Tier. Click here to read it. A new tab or window will open.


Sphere: Related Content