It’s one of the most dreaded calls to get for an IT staff member – the one where a user complains about the quality of their VoIP call or video conference. The terms used to describe the problem are reminiscent of a person who brings their car in for service because of a strange sound “ I hear a crackle”, or “it sounds like the other person is in a tunnel” or “I could only hear every other word – and then the call dropped”. None of these are good, and unfortunately, they are all very hard to diagnose.
As an IT professional, we are used to solving problems. We are comfortable in a binary world, something works or it doesn’t and when it doesn’t, we fix the issue so that it does. When a server or application is unavailable, we can usually diagnose and fix the issue and then it works again. But, with VoIP and Video, the situation is not so cut and dried. It’s rare that the phone doesn’t work at all - it usually “works” i.e the phone can make and receive calls, but often the problems are more nuanced; the user is unhappy with the “experience” of the connection. It’s the difference between having a bad meal and the restaurant being closed.
In the world of VoIP, this situation has even been mathematically described (leave it to engineers to come up with a formula). It is called a Mean Opinion Score (MOS) and is used to provide a data point which represents how a user “feels” about the quality of a call. The rating system looks like this:
Today, the MOS score is accepted as the main standard by which the quality of VoIP calls are measured. There are conditional factors that go into what makes an “OK” MOS score which take into account (among other things) the CODEC which is used in the call. As a rule of thumb, any MOS score below ~3.7 is considered a problem worth investigating, and anything consistently below 2.0 is real issue. *(many organizations use a different # other than 3.7, but it is usually pretty close to this). The main factors which go into generating this score come from 3 KPI’s
- Latency / Delay
So, in order to try and bring some rigor to monitoring VoIP quality on a network (and get to the issues before the users get to you) network staff need to monitor the MOS score for VoIP calls. In the real world there are at least three (separate) ways of doing this:
1) The “ACTUAL” MOS score from live calls based on reports from the VoIP endpoints.
Some VoIP phones will actually perform measurements of the critical KPI’s (Loss, Jitter, and Latency) and send reports of the call quality to a Call Manager or other server. Most commonly this information is transmitted using the Real Time Control Protocol (RTCP) and may also include RTCP XR (for eXtended Report) data which can provide additional information like Signal to Noise Ratio and Echo level. Support for RTCP / RTCP XR is highly dependent on the phone system being used and in particular the handset being used. If your VoIP system does support RTCP / RTCP XR you will still need a method of capturing and reporting / alarming on the data provided.
2) The “PREDICTED” MOS score based on network quality metrics from a synthetic test call.
Instead of waiting for the phones to tell you there is a problem, many network managers implement a testing system which makes periodic synthetic calls on the network and then gathers the KPI’s from those calls. Generally, this type of testing takes place completely outside of the VoIP phone system and uses vendor software to replicate an endpoint. The software is installed at critical ends of a test path and then the endpoints “call” each other (by sending an RTP stream from one endpoint to another). These systems should be able to exactly mimic the live VoIP system in terms of CODEC used and QoS tagging etc so that the test frames are passed through the network in exactly the same way that a “real” VoIP call would be. These systems can then “predict” the Quality of experience that the network is providing at that moment.
3) The “ACTUAL” MOS score bases on a passive analysis of the live packets on the network.
This is where a passive “probe” product is put into the network and “sniffs” the VoIP calls. It can then inspect that traffic and create a MOS score or other metrics which is useful to determine the current quality of service being experienced by users. This method removes any need for support from the VoIP system and also does not require the network to handle additional test data, but does have some drawbacks as this method can be expensive and may have trouble accurately reading any encrypted VoIP traffic.
Which is best? Well, they both all have their place, and in a perfect world an IT staff would have access to live data and test data in order to troubleshoot an issue. In an even more perfect world, they would be able to correlate that data in real time to other potentially performance impacting information like router / switch performance data and network bandwidth usage (especially on WAN circuits).
In the end, VoIP performance monitoring comes down to having access to all of the critical KPI’s that could be used to diagnose issues and (hopefully) stop users from making that dreaded service call.