Philosophy to Astronomy: data center

Showing posts with label data center. Show all posts

Tuesday, July 3, 2007

DATA CENTERS, PART 2

The general attributes of each Data Center Tier are presented below.

This Part-2. Click here to read Part-1. Click here to read Part-3. A new tab or window will open for each post.

Data Centers are classified into four tiers. Tier-1 refers to a basic facility and Tier-4, to the most reliable and sophisticated type. This post goes into further detail about each tier.

Tier-4

takes 15 to 20 months to plan and implement
is the most expensive type and most costly to operate
is housed in a stand-alone building
is staffed "24 x 7 x forever"
intentionally uses only 90% or less of its total load capacity.
has at least two active distribution paths for connectivity, power, and cooling
All paths are physically separated and always active. The failure of any single active path will not impact uptime.
All components are physically separated. The failure of any single subsystem will not impact uptime. All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches.
Preventive maintenance can be safely done without disrupting operations. Maintenance on any and every system or component can be performed using backup components and distribution paths. The failure of key nexus points will not impact uptime.
A Tier-4 site has a fault-tolerant infrastructure. The site location is not susceptible to any single major disruption. This extends the capability of the lower tier through the addition of measures that will prevent disruption even when crucial components unexpectedly fail. Tier-3 only allows the preventive maintenance of crucial components and has no safety provision for the unexpected failure of crucial components.
All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches. Dual-power technology requires two completely independent systems that feed power via two paths. Research has determined that 98% of all failures occur between the UPS and the computer load.

Tier-3

is housed in a stand-alone building
takes 15 to 20 months to plan and implement
is typically staffed for two shifts or more intentionally
uses only 90% or less of its total load capacity
has at least two paths for connectivity, power, and cooling distribution.
All paths are physically separated. However, only one path is active at any time. The unexpected failure of an active path will impact uptime.
All components are physically separated. The failure of any single subsystem will not impact uptime. All IT equipment is dual-powered and installed so as to be compatible with the site's topology. Any non-compliant end-user equipment is equipped with point-of-use switches.
Preventive maintenance can be safely done without disrupting operations. Maintenance on any and every subsystem or component can be performed using backup components and distribution paths.
This has a concurrently maintainable infrastructure. The site location is not susceptible to unexpected minor disruptions. This extends the capability of the lower tier through the creation of a second distribution path for connectivity, power, and cooling.

Tier-2

may be housed in a wing or floor of an existing building
takes three to six months to plan and implement
is typically staffed for one shift
has only one path for power and cooling; may have a second path for connectivity
has a backup set of only critical power and cooling components, e.g., extra UPS batteries, cooling units, chillers, pumps, and engine generators
The unexpected failure of any component or path will impact uptime.
Operational errors will likely cause a disruption.
The site location is susceptible to all kinds of disruptions. The infrastructure must be shut down to safely perform preventive maintenance.

Tier-1

may be housed in a room or wing of an existing building
typically takes less than three months to plan and implement
is not staffed
has only one path for connectivity, power, and cooling
The unexpected failure of any component or path will impact uptime.
Operational errors will cause a disruption.
The site location is susceptible to all kinds of disruptions. The facility must be shut down to safely perform preventive maintenance.

Despite its basic infrastructure, a Tier-1 center still provides a better IT environment because:

It offers dedicated space.
Its online UPS system does a better job than a standby UPS at filtering power spikes, compensating for sags, and covering momentary outages.
It has nonstop, dedicated cooling equipment.
It has an engine generator to withstand extended power outages.

Sphere: Related Content

Sunday, July 1, 2007

IT ASSET AUDITING

Low-risk. High-reward.

You’re an organization that’s lost count of your desktops, servers, and other IT equipment. Sounds familiar? Most organizations are in that situation.

This is a low-risk, high-reward activity. Be legally compliant. Most software publishers do not litigate. Instead, many use the results of auditing reviews as a basis for “true-up” deals. “True-up” refers to the process of buying more licenses.

Two examples illustrate:

You have 150 licenses of software X. The audit reveals that 180 licenses were deployed and that all 180 are being used simultaneously. You will be required to pay for 180 licenses. This is “true-up.”
You purchased 300 licenses of software Y. You installed it on 300 desktops. The audit reveals that only 250 instances of software Y are being used. Come renewal time, you pay for only 250 licenses. Let’s refer to this as “true-down.”

The auditing process can collect more useful information. And you should take advantage of that. Since you’re checking every desktop and server (and other IT equipment, but we won’t include that here) anyway, you might as well gather the additional information.

Learn the configuration of each machine.

Desktop 56 runs the G/L of the accounting module of SAP Business All-in-One on Windows 2000.

Identify the user of each machine and confirm the appropriateness of that role to the machine.

Desktop 56 is assigned to Tom, a cost accountant.

Determine whether the correct software is installed on a particular machine.

You discover the accounting module would run faster if more RAM was added to Desktop 56.

While it is true that many users use unlicensed software, a good number of them do so unwittingly. How does this come about?

Confusion that arise from vague, complex, and ever-changing licensing rules.

Software publishers frequently change user licenses. About half the time, they do it during the active life of the product. Case in point: Microsoft. It changed significant parts of its Client Access License (CAL) three times during the three years of Microsoft Windows 2000’s marketing life.

Changes in the user IT environment.

In the data center, servers are inevitably upgraded to newer, more powerful models. Software licenses recognize this and permit software to be installed in the replacement. The process isn’t complete, however, until the same software is removed from the old server that was replaced. In many instances, this part is overlooked. Result: one licensed and one unlicensed deployment.

Mergers & Acquisitions.

It may surprise you but this isn’t a subset of the preceding reason. Why? Many software licenses do not automatically transfer licensee rights to another party unless it’s stated explicitly. More often than not, after one company acquires another, the acquirer takes control over the assets of the acquired. In theory, the acquirer has the responsibility of checking this provision. In reality, lawyers on both sides are busy dealing with other larger issues.

Misunderstanding between IT and Procurement.

This is related to the first reason, namely the confusion that arise from vague, complex, and ever-changing licensing rules. In theory, either IT or Procurement should know how many and what kind of licenses should be acquired. In reality, this often falls between the cracks. Result: under- or over-purchases of appropriate or inappropriate licenses. Two examples: (1) a license is deployed on a server that has more CPUs than the license allows, and (2) widespread access is allowed for software that has a limited-user license.

Is non-compliance a serious problem?

It is. Most software publishers deal with offenders—especially first-timers—in an understanding and lenient manner. Publishers realize that they can lose customers and antagonize entire user groups if they act with a heavy hand. To be fair, publishers deserve the revenue from unlicensed deployments. Bottom line: I think the relative laxity stems from practical reasons of customer relations as well as the recognition by the software industry of the vague, complex, and ever-changing rules of their products. Result: many publishers will settle for true-up deals.

Sphere: Related Content

Saturday, June 30, 2007

DATA CENTERS, PART-1

Understanding Tier Classifications

When mission-critical applications fail, so can its owner. Every measure should be taken to try to prevent this from happening. Protecting the technology that run the applications is one of the first things that must be done. This task is made much easier when the technology is housed together. Housing them in a secure environment is the function of a data center. Servers, storage devices, networking gear, and the people who keep them running operate out of these facilities.

A data center, to sum it up, is the physical home of the IT capabilities of organizations.

BACKGROUND

The Uptime Institute, an independent association, developed the tier classification of data centers. There are four tiers. Tier-1 refers to a basic facility and Tier-4, to the most reliable and sophisticated type. Institute certification is recognized as the industry standard. Anyone can claim Tier-4 status but unless it came from The Uptime Institute, it should be viewed with skepticism.

The institute grants a data center its tier classification only after a rigorous evaluation of the facility’s design and sustainability. That the institute is a third-party and is the body that developed the standards give its determination a credibility that self-proclaimed claims just don’t have. Institute certification provides an objective basis for judging the capabilities of a data center.

In my experience, this is important. Between 2001 and 2003, I helped clients colocate at a large local data center that advertised its Tier-1 classification.

This was the former Exodus data center in Elk Grove Village, Illinois. Exodus went bankrupt in Q3 of 2001. Cable & Wireless USA bought it in Q1 of 2002. C&W, in turn, also went bankrupt and sold it to Savvis in Q1 of 2004. Savvis, to my knowledge, still owns it.

OVERVIEW OF TIERs

The institute’s summary of the high-level characteristics of each tier is presented below.

Tier-1

Has a single path for power and cooling distribution
Has redundant components
And has a mean uptime availability of 99.671% (equivalent to 29 hours of downtime a year)

Tier-2

Has a single path for power and cooling distribution
Has redundant components
And has a mean uptime availability of 99.749% (equivalent to 22 hours of downtime a year)

Tier-3

Has multiple paths for power and cooling distribution but only path is active at any given time
Has redundant components that make it concurrently maintainable
And has a mean uptime availability of 99.982% (equivalent to 1.6 hours of downtime a year)

Tier-4

Has multiple paths for power and cooling distribution that are all always active at any given time
Has redundant components that make it fault-tolerant and concurrently maintainable
And has a mean uptime availability of 99.991% (equivalent to about 13 minutes of downtime a year)

Note how small the improvement in uptime increases from Tier-1 to Tier-4. Meanwhile, the investment required to become Tier-4 is many times greater than Tier-1. In short, moving from 99.671% to 99.991% costs a disproportionate amount of dollars. Is it worth it? That’s the question for many data centers: is it? And I suppose the answer depends upon the customers that’ll use it.

DEFINITIONS

Concurrent maintainability refers to the capability of being able to perform all scheduled work without adversely impacting the end-user.
Fault-tolerance is the capability to sustain a worst case, unplanned event without adversely impacting uptime. Two major requirements for achieving fault-tolerance are redundant equipment and multiple active paths.
Single points-of-failure refers to the location or equipment that will bring the entire system down (downtime) if that location or equipment fails. Tier-1 and –2 have many single points-of-failure. Tier-3 has several. And Tier-4 is supposed to have none.
Site Infrastructure refers to the data center taken as a whole. A typical data center has at least 20 major mechanical, electrical, fire protection, security, HVAC, and other systems.
Sustainability refers to the ease, convenience, and cost of operating the Data Center. A well-designed site will cost less to operate and be easier to maintain. As a group, sustainability factors account for 70% of all infrastructure failures. Human decisions and activities primarily account for sustainability factors. Two-thirds of all failures result from management errors. The remainder arise from errors made by operations staff.
Useable capacity refers to the maximum load that the center’s systems can support. This is less than the non-redundant capacity since allowance must be made for aging components, installation errors, and the size of the desired buffer to accommodate surges in demand. Tier-3 and -4 sites are typically the ones that limit their total load to 90% of the aggregate capacity

RELATED POSTS

This precedes two posts about the general attributes of Tiers. Click here to read Part-2 and here to read Part-3. A new tab or window will open for each post.

This also precedes a post about the business factors that should be considered in selecting a Tier. Click here to read it. A new tab or window will open.

Sphere: Related Content

Monday, May 14, 2007

PROJECT MANAGEMENT: REALISTICALLY CONSIDER THE BUDGET

Realistically consider the budget!

A project has five major elements:

The budget
The schedule
The people
The resources
The rules

Project management has many aspects but all of them fit under these categories.

Why is budget listed first?

Well, isn’t cost usually the first thing the project sponsor brings up?

In fact, isn’t cost frequently brought up in the same conversation that sparked the project idea?

Doesn't it make sense therefore to immediately consider it?

If the sponsor wants to build a new $1 million data center and is suggesting that it can be done for $600,000, then maybe the project idea should be squelched right there and then.

Let's say that you were able to persuade the sponsor to increase the budget to $1 million. Is everything fine? No. It's easy to overlook a related aspect, namely, the scope. Specifically, you want to ensure that the budget is appropriate for the scope of the project. It is time, therefore, to define the scope.

Doesn't this approach run counter to the "normal" process of defining the scope before estimating the budget? On the other hand, doesn't the example given happen more frequently in real life? Reality often does not follow the textbook model. In this case, it certainly doesn't. The budget often precedes the scope although conventional project management thinking says that it should be the other way around.

Let's say that for any number of reasons, many of which were beyond your control, the new data center was finally finished at a total cost of $1.2 million. Now the question is whether you think the sponsor will consider the project successful?

What do you think?

The sponsor will probably not consider it a successful outcome unless they were forced to approve every change and/or activity that increased the total bill by another $200,000. An over-budget situation can be avoided by two things: first, ensure that the budget is appropriate for the project scope, and second, implement a strong change control process over the project cycle.

Where does the project scope fit in? As its own entity, it doesn’t. However, the components that comprise the project scope do. These components are the budget, the schedule, and the resources (including the people). As you can see, the project scope will fit once it is decomposed into its four elements.

It’s human nature to try to get more for you money. It’s suicide to accept a project that has an unrealistically low budget relative to its goal. Let’s keep that in mind.

Sphere: Related Content

Philosophy to Astronomy

Tuesday, July 3, 2007

Sunday, July 1, 2007

Saturday, June 30, 2007

Monday, May 14, 2007

Alex Pronove

Visitors

BBC Earth Explorer

News by Reuters

Blog Archive