The world’s most accurately measured office WiFi network

Network performance science took a small step forward today, with the first “commodity” high-fidelity service quality measurement data.

I would like to share with you today’s excitement in my professional world. I am sat in the City of London branch of the Institute of Directors. In the last hour I have made the first “commodity” measurement of network quality based on ∆Q metrics, with the support of Predictable Network Solutions Ltd.

Table 1 - PNSol

So what does this mean, and why is it of interest and importance?

When we measure a network, we would like to have metrics that closely reflect the quality of experience (QoE) that real users get. Those QoE-centric metrics can then be used to answer important questions about the network service:

  • Will it be any good for running my key cloud and unified comms applications?
  • If not, why not?
  • What to do about it, so that I do get the experience I need?

The challenge is that packet networks are systems that involve very rapid changes, all of which affect the delivered QoE. For example, each packet sent over a WiFi link might encounter a slightly different radio environment, as people move around an office. Packets encounter queues along the path with constantly varying size. They can also take very different routes, which will have varying length and speed.

The science and art of observing these most rapid changes, and extracting the relevant information from them, is relatively new. Indeed, the ideal metrics are based on a new branch of mathematics not yet in the textbooks, called ∆Q. Up until now, ∆Q-based measurements have only been available as a high-end boutique consulting offer, bought by large network operators, major equipment vendors, and exotic government projects.

For instance, ∆Q metrics were used to optimise getting streaming video out of a particle supercollider… at 40 million frames per second! This is not the kind of problem that your typical Netflix user has, but the techniques and tools for these extreme applications can equally be used to manage more mundane problems and environments.

NBSH wifi

I now have that same measurement system running on my laptop, sending me results every ten minutes of the performance of my Internet connection. This used to cost tens of thousands of pounds, took weeks to set up on a hand-crafted basis, and required people at PhDs level and above. It is now (in theory) technically available to any numpty (with an ordinary maths degree and a serious motivation to learn new tricks).

What this measurement system is doing is firing out a slow and steady stream test packets, of varying sizes and spacing. These hit one of a number packet “reflectors” running in Amazon Web Services (AWS), in this case hosted in both London and Dublin. They then return back to my laptop.

The analysis system captures the accumulation of delay (and loss) of each packet in each direction. It does this using timed observations of each test packet made both at my laptop and in AWS. The cloud-based analysis software ingests a complete data set over a five minute experiment run before processing it.

Once it has a full “experiment run”, it then “pulls apart” that network quality into its basic components: delay due to geography (G), the size of packets (S), and the variable load on the network (V). These in turn can be analysed to establish the causes of bad QoE: network architecture and routing (G); link speed (S); and resource scheduling (V).

From this collected timing observation data, the system creates network “X-ray crystallography” or “functional MRI scan” charts. (You can view an example PDF here.) These capture the network’s instantaneous performance, which is what drives the end user QoE. These “packet flow pictures” can be rendered as a number of different “views” on the underlying data, depending on what we are interested in knowing.

Table 2 - PNSol

Above we can see the downstream variable delay due to load (V), plotted against packet size. Most packets experience very little delay due to load, but there are outliers, whose structure seems independent of packet size. (Outliers are possibly rare at small packet sizes, but that could just be coincidence; we would need to examine more measurements).

Table 3 - PNSol

In contrast, we see here the upstream delay, which has far more variability. This is expected, as a normal byproduct of how wireless systems work and coordinate distributed end point access to a shared spectrum resource.

packet performance multimeter

For me, this is a landmark step forward, as it means we have (in principle) a “commodity” means of high-fidelity measurement of network service quality. It is usable by people who are not the originators of the measurement system itself, and don’t need to be involved in its operation. They can just collect and use the results.

Of course, this is still a long way from being a polished product. It is being done as a prelude to the launch of Just Right Networks Ltd. But at least we all now have a hope of owning a “packet performance multimeter”, akin to the electrical multimeter we take for granted for electrical engineers. We haven’t yet even agreed standard units for digital experience quality, so it’s still early days.

There is an urgent need by telco and cloud provider R&D labs for high-fidelity metrics, because these can be used to calibrate existing measurement systems, and establish their error bounds. Regulators are also searching for better metrics and measurements that enable them to protect end users from poor quality services. So there are immediate applications for this technology.

Whilst we are still years away from fully instrumented and automated broadband service quality management, today is a small step towards that desirable destination.

 

For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.