Companies have been virtualizing their servers as a furious rate for the past 10 years or more. In the process, they’re coming across challenges in how best to manage these highly virtualized environments. To learn more about what those challenges are and how to address them, we talked with Steve Francis, founder and Chief Product Officer for LogicMonitor. His company offers a hosted management service that covers both traditional IT shops as well as virtualized infrastructure and cloud environments, so he’s well versed in the differences between managing each type of environment.

NTT Com: What are some of the challenges inherent in monitoring virtual environments as opposed to physical?

Francis: The main issue is virtual environments make it so much easier to create new servers, to bring them up and down. It’s very hard for a monitoring system to keep track of that. If you’re bringing up a server for a week, even if you have a process to put that into monitoring, once someone says, “Ah it’s only going to be there for a week,” that process is going to fail.  So the real challenge is rate of change. It means everything has to be automated and provisioned around it.

NTT Com: What issues can arise if you’re relying on multiple tools to monitor different components of a virtual environment?

Francis: There are two main issues. One is that you can have no indication of any performance problems in a tool that’s looking at, say, just the guest operating system performance or the performance of an application on that guest operating system. Disk contention is really the prime example. Say you have one virtual machine running on a hypervisor, and that hypervisor is sharing its disk subsystem on a back end storage array. Then there’s another virtual machine running on a completely different hypervisor system that happens to be sharing the same back end storage array. One of those virtual machines can completely saturate the disk performance capacity of the back end storage. If you’re using a point monitoring system for the hypervisor, you’re not going to see that. You’re going to see a slowdown and have no indication at all as to what it is.

The other big other issue with point monitoring systems is you may have 10 different monitoring systems, which we sometimes see in customer environments, with 10 different groups of people working on them. They aren’t going to exchange information very efficiently. And you have to set up 10 different on-call schedules and 10 different rotations and 10 different escalation systems. It’s just a less efficient way to work, as opposed to having everything visible in one portal. Then maybe the guy responsible for the database sees it’s getting slow and there’s a latency issue due to excessive IOPs on the back end storage where his server is. He may not know how to fix that but he knows who to call. So he doesn’t keep spinning his wheels on an issue that he has no visibility into.

We basically treat cloud as a separate data center. Really the only fundamental difference is the rate of change is likely to be even higher with cloud.

NTT Com: What are some of the key benefits to a hosted monitoring solution as opposed to doing it in-house, especially when it comes to virtual environments?

Francis: One big one is that the monitoring is outside any of your data centers which is important because if the data center goes offline and your monitoring is in it, you’re not going to get notifications about it. The other side of that is what happens after a data center goes offline. When it comes back up, if the monitoring system is external to that data center, it means you’ll immediately have visibility into what has recovered and what hasn’t without having to rebuild the monitoring and wait for it to recover and start to discover the status of things.  And data center in this context doesn’t necessarily mean a physical data center, it could also mean a [third party] cloud infrastructure.  Last year when Amazon had to reboot all of its EC2 instances, we heard of companies that had their monitoring inside Amazon, so they had 6 hours before they even knew all their servers were down because their monitoring was on one of the servers that got rebooted and didn’t come back up. That’s why monitoring should be outside your data center.

In the virtualized space, one aspect of it is companies are testing the waters with cloud solutions but they’re certainly not throwing all of their infrastructure in there. The most common deployment we see is a hybrid cloud where they’ve got the bulk of their processing in their data center, but also have some in a cloud space. That kind of deployment is really only suited to a SaaS-based monitoring system because you need something that can monitor within your premise and within the cloud.  Purely cloud-based monitoring isn’t going to do it because it can’t get the stuff that’s in your data center and a purely premise-based one won’t get the stuff that’s in the cloud.

And then with pure virtualization, you need automatic discovery of new machines and also automatic classification as to how to alert and monitor on those machines.  If somebody is building up a new QA server, you don’t want alerts for that to be sent to your production Web server team at 2 in the morning.  So you need a system that has that intelligence.

NTT Com: Virtualization is obviously one of the underpinnings of cloud and, as you mentioned, you folks also monitor cloud environments. How does cloud monitoring differ from monitoring a virtual environment?  

Francis: We basically treat cloud as a separate data center. Really the only fundamental difference is the rate of change is likely to be even higher with cloud. The way you’re going to discover, monitor and manage things in a cloud is probably different than in your virtual environment.  In your virtual environment, when you’re producing new servers, you control the network so you can do network scans and find the servers that are being introduced. That’s not going to happen in the cloud. They’re going to be randomly spread all over your cloud provider’s addresses. The main challenge there is you need a way to discover new machines and instances either by integrating into the API of your cloud provider or by having a system that allows instances as they’re created to register for monitoring.

Tags: , , , , ,