Applying Business Intelligence to the Data Center

Building on the idea I wrote about last week (Business Intelligence and the Data Center), I wanted to do some brainstorming about the potential impact of applying BI to the data center.

Why Should Anyone Care?

There are two reasons that this topic is vitally important.

First, I think that IT in general is headed for a crisis. (See The Coming Crisis of IT Management.) The cloud in increasing both the complexity of the infrastructure IT manages and the speed of change. While it is true that the cloud makes some things simpler and easier and that SaaS applications reduce complexity, the footprint of IT is steadily growing.

As the footprint grows, more and more assets that support the enterprise are out of IT’s control. Unless IT improves its game, the pace of change will slow down just when it needs to become faster. It is easy to imagine that companies that master the new world of IT will be able to overtake those who don’t.

Second, the assets in most data centers are not managed carefully. In this article, we will look at storage. How often do data centers add large amounts of storage rather than take on the challenge of modeling the need for storage and closely tracking usage?

Instead of improving storage management and modeling, data centers often load up on billions of dollars in excess storage. Now imagine what will happen when cloud provisioning becomes popular. Will it be carefully managed or not? Do we even know how to manage it?

Applying BI to the Data Center Does Not Mean…

Let’s take a closer look at ideas that will not work. When I talk of BI for the data center, I don’t mean using BI as it is used in other parts of the company to track the management processes. There is nothing wrong with using the balanced scorecard or any other enterprise performance management system to track what you are attempting to do.

I’m all for business-focused metrics that tell you how you are doing in dollars and cents and in ways that indicate the business impact of IT. But a finely tuned, cascading set of KPIs tracking elegantly aligned objectives won’t help you figure out if you have too much storage.

Another pertinent question concerns ITIL. Would implementing ITIL the right way solve the problem? Yes, almost.

ITIL, when done properly, creates a model of the services provided by IT and links them to the business activity supported. ITIL is often used halfway, just to help sort out the complexity of what IT is offering and to track what services are being applied.

The full model then tracks what IT assets are used to support each of the services and also what business activity is supported by the services. With this full model, you can then determine the IT cost to support business activity with great granularity.

You can determine the IT cost for creating an individual unit of product or service. This is of great use when it comes time to retire systems, but it doesn’t solve the problems mentioned earlier.

Applying BI to the Data Center Means Better Models and More Metrics

I attended a presentation this morning from two senior engineers at Etsy, which is running a state of the art process for continuous deployment. They have 65 engineers and deploy code 20 times a week or more.

Last year, they deployed the web site more than 2,400 times. How they did this will be the subject of more blogging and some really cool problem statements. But for this article there is one key point the Etsy engineers mentioned: They track more than 60,000 metrics about their website. This is not a typo: 60,000.

What this allows them to do is see the impact of deploying new code in small increments. There are huge screens at Etsy that graph all sorts of data related to website performance.

When a deployment happens, a vertical line is drawn on the graphs so that the impact of the change can be tracked for good or for ill. The biggest impact of this approach is that it reduces the fear of change.

This is an example of what I mean by BI for the data center, although you could argue that it is also a form of Operational Intelligence for the data center as well. To truly apply BI to a data center, you need a model of the data center, a detailed model that shows what is happening.

Then you need a model of the activity that is taking place in the data center and how that activity is using resources. This model doesn’t have to be perfect.

A crude model in many situations is good enough. This all sounds well and good at a high level. A few examples will help us understand what I mean in practice.

Finding Your Orphans, Then Killing Them

It is not just storage that is purchased in abundance to make up for the lack of knowledge about what is truly needed. Research has shown that around 15 percent of the servers in a typical data center are orphans, meaning they are sitting around doing nothing but eating space in a rack and consuming power and cooling resources.

Orphans mean that data centers are larger than they have to be and cost more than they should to operate. In addition, servers tend to be dramatically underutilized (usage frequently only spikes up to 50% of capacity at its highest).

To apply BI to this problem, you must implement the kind of modeling that nLyte Software has created. nLyte’s products create a detailed model of the data center, from the physical space, the way cooling happens, all the way to the servers in each rack and the applications running on each server.

This type of model allows orphans to be cleaned out quickly, power bills to be reduced, and space for expansion to be found. The model of the data center then allows a wide variety of automation to kick in to manage servers and bring them in and out of states of readiness.

Cooling can be optimized to reduce energy usage. The detailed model removes much of the mystery about what is happening in the data center and can be a godsend for larger data center consolidation efforts and mergers.

Learning Not to Waste Power

Another task that is crucial to reducing waste is understanding when servers can be powered down. Data center managers are adopting two approaches with great results. The first approach is to turn off what you are not using.

1E’s products, for example, allow both desktops and servers to be turned off when not in use. The larger the number of machines that are on when they don’t need to be, the more power is wasted.

Of course, this task also requires a model of the machines being managed. You don’t want to turn off computers doing useful work or those that many need to be available at a moment’s notice. 1E and other such products enable sophisticated policies to be set so that nobody is surprised to find a machine is off when it is needed.

Intel has been interested in saving power for some time and has started to add features to chips to allow them to go in and out of low power states virtually instantaneously. The software interface to these chips is called the Intel Data Center Manager. A new crop of products is being created to take advantage of these capabilities.

For example, Power Assure offers a product that monitors application activity and then allows servers to be turned off and on rapidly to reduce power usage as servers become more or less busy.

Understanding Storage

Storage usually represents the biggest investment for any data center. In the modern day, that means Storage Area Networks.

For no good reason, SANs are the least monitored and modeled aspect of the data center. Companies buy too much storage because they do not know how much they will need. Companies buy the fastest storage because they do not know whether cheaper and slower would do.

As virtualization makes this picture more complex, it is vital to know about not only the storage used by each physical machine, but also by each virtual machine. A proper model of a SAN looks at all of the traffic flowing to and from the network and can monitor and report on a wide variety of metrics to identify warning signs or problems.

Virtual Instruments is a monitoring and modeling technology that creates just such a model. With the right information, most data centers find out exactly how overprovisioned they are.

Often, it is possible to start purchasing cheaper storage if applications do not need the highest speed storage to work properly.

Now That’s What We Call BI for the Data Center

What is applying BI to the data center all about? It is about doing what IT recommends that the rest of the company do with software: creating models to track activity, gathering data in the context of those models, and then using the models to help understand what is going on and plan for the future.

BI in the data center needs a new set of models. Vendors have taken note and are starting to supply them to meet the common needs.

One implication of this that we will save for a later article is how the gaps between what these products provide and the complete story of a data center can be filled. Enterprise software applications from vendors will never a complete picture of any business.

Data Center Infrastructure Management technology won’t either. In my next article, we will look into how to fill the gaps based on some of the approaches that Etsy has taken. But even when you fill the gaps BI in the data center is a bit different in that the systems must be monitored both historically and in real time.

In this sense, BI in the Data Center is really much more like Operational Intelligence than like BI. My instinct is that once an IT department really gets its BI or Operational Intelligence act together, there will be many lessons learned that can be used to improve the BI and Operational Intelligence systems for the rest of the company. (See the problem statement on Operational Intelligence for CITOs for a definition of Operational Intelligence.)