When it comes to critical application monitoring, perhaps no applications are more important today than web based apps.
Virtually all major application vendors have now moved to a web-based model. The main reason is probably obvious; everyone has a device with a browser at almost every moment.
Beyond the standard desktop/laptop - smartphones and tablets are with us virtually 24x7 (not to mention newer devices like smart watches and other browser capable “wearable” devices). Due to this proliferation of devices, users are now more comfortable using a web-based application, with its familiar look and feel, than they are with individual “thick clients” which all have very different user experiences and workflow.
For those of us who specialize in network and IT monitoring, this new reality means that we must seriously understand and focus on effective web application monitoring. This article is therefore meant to be a quick “get started” guide to the important aspects of web application monitoring. Its goal is to help you get familiar with the various aspects / terminology and offers a simple checklist that would be valuable to any IT professional that finds themselves tasked with Web application monitoring.
Before we dive into the checklist, I want to make it clear that for this article, I am focusing on website application performance monitoring and availability monitoring. There are certainly many other aspects that are related such as security (from malware, hijacking, DDoS etc.), architecture, scalability, user experience (do my users “like” to use my app?) and so forth that are interesting and important to think about – but they are not the focus of this article. Here, we are primarily concerned about monitoring the application from a network/server centric viewpoint with an emphasis on availability and performance.
The first item to identify when creating a test plan is to determine from where the testing should be done.
For example, are all of your users accessing the application from the same location (i.e. from the corporate LAN)? If not (and this is probably the case) then where are your users located. Would you anticipate users from a specific geographical region (a good example of this would be educational apps used by a K-12 school system or an application used by a local/state government)? Or are you supporting global users?
Understanding the geography of your users means that you can setup your testing to effectively mimic your user locations.
2. System involvement
What I mean by this is that it’s important to really understand every system that may be involved in the availability and delivery of the application to the end user.
It can be tempting to think of just the web server(s) as the systems to monitor. However, in many cases there can be secondary servers that play a role in keeping the application running.
For example, does the main web app rely on data from any outside sources or databases? If so, then you need to bring those ancillary servers into the monitoring scope as well.
How about storage? Larger networks tend to use NAS or SAN to provide storage density and scalability. If your front-end servers rely on these systems then it would be important to monitor them as well. If the app servers are virtualized then you need to be able to monitor both the individual VM as well as the underlying hypervisor and physical resources the VM relies on.
Lastly, there are many potential network infrastructure pieces that will play a large role in the performance and availability of the application. Switches, Routers, Firewalls, Load Balancers and so on, these should not be ignored when putting together your monitoring plan.
3. Know your real KPI’s
When setting up any monitoring plan, knowing when to stop can be as important as knowing where to start.
I have seen many organizations that (with the best of intentions) try to monitor every single possible piece of information they can. The idea, of course, is that they don’t want to miss anything. The problem occurs when suddenly some measurement begins to alarm (always in the middle of the night). What should they do? The application is still available, and seems to be running “OK” but we have an alarm. Unfortunately, most people then ignore the alarm, which can lead to a culture of ignoring alarms (It’s that monitoring system that cried wolf!) This then leads to missing the really important alarms when they do happen.
The question is, unless you know that a measurement is critically related to the health of the application, then how do you know what to do with it? The best monitoring plans begin with understanding which KPI’s are really important i.e. identifying the ones that you would want to be woken up in the middle of the night for – and then getting those set up first.
If you want to expand further beyond that, then at least you know you have an effective baseline from which to expand.
Here is the top 10 list of KPI’s that I usually start with when implementing a web application performance monitoring service.
This list is specific to the actual availability and performance of the application itself (and not all of the important but ancillary components as described above).
1. ICMP Availability - Always start with a ping. If you can’t ping it, you have a problem.
2. Port checks - This is both a TCP connection to the port in use *typically, 80 and 442 but could be other special ports as well.
3. Service checks - This means looking at the server for any services that need to be running. Typically IIS (Windows), Apache Tomcat (Linux) could be others as well.
4. URL availability - Just because you can ping it, doesn’t mean that it’s available. The URL(s) that constitute a usable application must also be available. Usually, this means having a test set up for a particular URL and looking for a response that is not a 404 (or similar) error.
5. DNS Resolution - It’s always good to check that the DNS settings for your site are resolving properly. Such a simple and basic thing can render everything else useless if it is configured incorrectly.
6. Page Loading Time - Basically, how long does it take for a user to download your full page?
Users are becoming more and more impatient when it comes to page download time.
The latest surveys suggest that if it takes more than a few seconds for the page to load, users believe there is a problem with the application and then they just leave.
7. Download speed - This is a measure of how fast the data is being transferred from your web server to the user. Of course, there are many factors involved which may impact this speed – including the bandwidth of the user’s connection.
A single metric for this KPI does not suit all cases.
However, if you can perform multiple tests from a few different types of connections, then you can perform some long-term trend analysis on those results. After you know what is “normal” you can alert on any drastic anomalies to that trend baseline.
8. SSL Certificates - If you provide any type of secure transactions on your application, then you must have a working certificate chain on your servers.
Any expired or otherwise incorrectly set up Certificates will be noticed by the user’s web browser and they will get a warning that there is a problem with the website. Google’s Chrome browser and Apple’s Safari Browsers are notoriously strict about what they will accept as a correctly formed Certificate.
9. Log monitoring - Most applications will provide some type of error log functionality. You should have a monitoring system that can access those logs and process/parse them to look for specific error codes or messages.
10. NTP checks - Many applications are ultra-sensitive to small variations in the time stamp on the server. All servers should be synchronized to an NTP server. That NTP server can be onsite or you can use one of the publicly available NTP servers. In either case, you should consistently monitor the availability and response of the NTP servers you use.
In the end, there are quite a few possible metrics that can be used to determine the true performance of a web based application. In order to properly deploy a monitoring system, you need to know a lot about how the application is supposed to perform and what components are involved in making sure it is available.
I hope the checklist provided above offers some insight into how all of these things fit together.