Application performance management (APM) isn't just for system administrators. It can and should be an integral part of the software development cycle. Some problems show up only after running for a long period of time, and APM is one of the best ways to make developers aware of them. Not every bug immediately produces incorrect results. Some of them make the application slow down over time. Some make it crash after a long period or when an unusual situation occurs. Code which works well with test data may not scale well as its database grows. Normal testing won't always catch these issues.
DevOps and APM
The big trend in software development is toward DevOps. This new paradigm brings development and operations more tightly together and promotes shorter development cycles. With older approaches, developers hand off their code to the QA department. It tests the code and reports any bugs back. When the code is sufficiently bug-free, it's released. The developers move on to the next urgent task and don't think about that application again for a long time.
Bug reports accumulate when things obviously go wrong. Urgent ones may force a new release, but more often nothing happens for a long time. Eventually the developers are assigned to create a new version, and they have to work through long lists of reports.
Bug reports by themselves aren't always useful. Developers will have questions about the exact circumstances of the problem, and the people who submitted them aren't likely to remember months later. DevOps replaces this loose connection with ongoing communication. Automated tools are an important part of it, and they're available to all members of the team.
With access to APM tools, developers can see how their code performs in the production environment. They can find bottlenecks and subtle problems before the software crashes or becomes hopelessly slow. They can use the same tools in the development and staging environments, letting them catch many performance issues before the code is released.
Tracking the history of performance measurements helps to identify the source of any degradation. Did throughput drop noticeably right after a new release? It's likely some change in the code is responsible. Has it been gradually declining over time? Perhaps the code needs to be reworked to handle the changing workload.
APM can narrow performance changes to specific operations. If SQL or Ajax calls are taking longer to complete, it can let developers know where the problem is. Or perhaps nothing is actually getting worse, but some operations are taking an unreasonably long time to complete. That could point to a query that needs optimization.
Shortening resolution times
Bugs that are hard to precisely identify are the orphans of the traditional development cycle. Reports of the form "takes a long time" or "occasionally crashes" are likely to get passed by in favor of more clearly defined problems. Once a problem gets deferred for a release, it becomes easy to ignore it forever.
The DevOps approach is to fail fast and fix fast. APM tools catch problems before human observers can, and they give detailed descriptions. The reports get back to the developers before the list of bug reports becomes huge.
With a short development cycle, a rewritten function will go into production as soon as the code passes all tests. Developers can compare "before" and "after" metrics to see whether the change improved things. If it didn't, they can try a different approach before they've forgotten what the issue is.
When performance issues are reported only after they become obvious to the users, the result is a period of frustration. If the code sinks into frequent crashes, developers have to work under pressure and deliver a fix quickly. Their changes may introduce new problems that won't get fixed for months.
Types of performance problems
Several kinds of problems impact performance while being hard to detect. APM can catch these problems more quickly than human observers and give better information about their cause. These are a few of the most common.
Operations allocate temporary memory and release it when they're done. If they fail to release it, it accumulates over time. A function may release it except under unusual circumstances such as handling an error, in which case the draining of memory is very slow but never stops. At some later point, perhaps after the code has been running for days, there's no free memory left. A function tries to grab a little more memory, it can't, and it all crashes. With no memory left, a graceful exit isn't possible.
APM tracks memory usage and can observe a slow memory leak in progress. How much help it can give in identifying the source depends on the tool and the situation, but at a minimum it should be able to identify that something is draining free memory. It may be able to report what operations contribute to the leak.
Poorly chosen algorithms
An algorithm may be very fast with test data but scale poorly as the amount of data it works with grows. This is especially common with sorting operations. The application may work well in the production environment at first. As its database grows over time, the weaknesses in its algorithms may start to become obvious.
Since the effect is gradual, people may not notice that the code is slowing down. They'll complain that it's slow but forget it was ever any other way. APM historical data can show changes over a long period of time and identify the operations that are getting longer turnaround times. Developers can pinpoint the parts of their code that need more efficient algorithms.
Inefficiencies can happen when an application calls time-consuming functions more often than necessary. If an operation on a set of data is computationally expensive, it shouldn't be done again unless something that affects the result changes. When APM reports that a part of the code is slow, developers can check if it's doing the same calculations more than once, or if it's doing calculations which it doesn't have to do at all.
Caching the results can reduce the number of expensive calls that are needed. Sometimes even that isn't necessary; refactoring the code will take an operation which is being done in multiple places and keep it in just one place.
Sometimes the problem isn't in the code which the developers have written. Most applications today make heavy use of third-party libraries. They can have all the kinds of problems which the programmers' own code does, but their internal operations are a mystery. Some of their operations may be inefficient. Developers probably install new versions as they become available; this is a good idea from the standpoint of bug fixing, but it can introduce surprises.
APM tools can report how much time is spent in third-party code. The results could turn up serious performance problems. Replacing the most time-consuming calls with custom code could provide major improvements. If the whole library is inefficient, it might be time to look at alternative libraries.
Operations teams think of APM first as a way to make sure software has enough hardware to run at a satisfactory speed and that configuration issues like memory allocation aren't holding it back. These are important, but often the right solution is to fix the code rather than throw resources at it. Incorporating the tools into a DevOps environment makes them available to coders as well as system managers, letting them identify inefficiencies and subtle bugs.
Developers don't have good information on application performance issues when all they have is user input and bug reports in an old-style development environment. Information on the nature of the problems is inadequate, and the time between discovering the problem and working on a fix is too long. The use of APM tools in a DevOps environment lets them learn about problems faster and in more detail. Performance keeps up with changing needs, and subtle problems get fixed before they turn into regular crashes.