One of the foundational steps in moving to DevOps is application monitoring, which is not always easy. There are many different possible performance problems and limitations, and the only way to track them down is by collecting information in bulk. Ask any developer who has maintained an application without the proper monitoring tools, and you'll probably hear a story or two about some issue that took months to track down or is still plaguing the application. Developers, server admins, and network engineers need information to be able to track down a problem.
In this article, I walk through an example of how application performance monitoring can help quickly discover, isolate, and solve problems that can negatively impact the user experience.
No Application Monitoring
Imagine trying to count the number of people at Millennium Park at exactly 10 am on Saturday morning, including people in cars driving by, people walking around taking pictures, and people sitting and having a mid-morning snack or feeding the pigeons. At exactly 10 am, you start counting, but unfortunately, quickly lose track - did that person just come out of the subway, or were they always there? How long was that light red - did some cars just leave?
Your information is out of date almost as soon as you start counting, which is like trying to debug an application without any monitoring. Assuming you're lucky enough to catch the issue right as it happens, by the time anyone starts trying to analyze the situation, the information is already out-of-date.
Basic Application Monitoring
Now imagine that you had cameras set up so you can take pictures of the park at 10 am. Suddenly, you have a much more accurate point-in-time representation of how many people are there. This is equivalent to an application with some basic monitoring. It's possible to identify and solve many more performance issues this way - with enough information, it may be clear that a database query took too long. At the very least, you can definitively eliminate problems and bottlenecks for any of the metrics you might be tracking.
Advanced Application Monitoring
Now let’s say you're in charge of tourism for Millennium Park and need to know when too many people are going to be there so you can send out personnel to direct traffic. You need to be able to predict when there is a risk of overcrowding and have a text sent to your phone so you can decide whether to call in some help.
You still have cameras set up so you can count the number of people, but instead of only taking pictures on-demand, you take a picture every 15 minutes. Your software calculates the number of people in the photos and determines if the number is going up or down. If it's getting close to a critical threshold, you are automatically alerted without having to check on the situation yourself.
This example is the equivalent of a comprehensive application performance management suite that is calibrated for your application so that you'll know if performance bottlenecks are happening in real time, and possibly even predict issues before they affect users.
Proactively Avoid DevOps Bottlenecks
In a complex network environment, much like our hypothetical scenario, elements can change quickly and unpredictably. A parade nearby could cause an influx of jovial pedestrians. Similarly, your site could be featured on a popular blog and visitors triple in numbers – which is great, but it means that you need to increase the resources available to keep your site working smoothly.
In DevOps, a network carrier could change their network speeds or configuration and cause performance issues. Or, perhaps your application is making too many calls to the database or requesting more information than it needs. It could be that the database requests are optimized, but the network latency between the application and the database or the application and the user is too high. Or it turns out that your application server's memory and CPU are constantly maxed out, and you need to scale up your servers. Issues could even be due to the end users' machine - a complex single page application (SPA) could fail on an older browser or a machine that is low on resources.
With so many factors that can affect application performance, it's advantageous to define key metrics and trends and monitor the full picture in real-time so you can pinpoint the issue and respond accordingly. Architecting a robust application monitoring solution will not only help developers - it will provide anyone who is responsible for enhancing software reliability and speed to market with some peace of mind.