One of the basic assumptions of the broadband business is that you need bandwidth – and lots of it. We work in an industry obsessed with quantity. That is because we use quantitative capacity for two separate functions. The obvious and essential use of capacity is to make it possible to move sufficient data around quickly enough. Additionally, we use capacity to create slack (via ‘over-provisioning’) so that packet scheduling problems don’t hurt us so hard. In other words, a lot of the transmission quantity we buy is not because we intrinsically need the capacity, but rather to get better delivered quality. This is because we are collectively not very good at scheduling demand to use the available supply.
Users aren’t stupid, and have a strong sense that quality matters as well as quantity. After all, if all you want is high average bandwidth (and nothing else) then your monthly Internet download hard drive is in the post. Don’t forget to post back your URL requests for next month! Such a service offers plenty of megabits per second, so you’ve nothing to complain about. Don’t laugh: Netflix exists at that very cost and performance margin between postal and online delivery.
Quality matters because it directly affects the user experience, and the more quality-demanding the application the more critical it becomes. Hence you see adverts like this one (in The Straits Times last weekend here in Singapore).
Users want to buy a broadband service that is going to meet their communications needs. That implies bounding these loss and delay characteristics for every flow over the network. Quality is by definition all about meeting those bounds. As you can see, consumer ISP adverts are visibly making claims about better quality. So to have suitably informed purchasers, we must somehow measure the loss and delay properties of the network.
That raises some interesting questions. How can we measure it in a meaningful way? Is there much difference in quality between service providers? How much difference can quality make?
Let me show you.
Here are the results of some experiments my dear friends at Predictable Network Solutions Ltd did in the UK. They illustrate the large differences in quality that exist, and the impact that makes on the user. They have a unique multi-point probing methodology that captures the full distribution of loss and delay for each access line.
They measured the service to two different premises in the same street, each being served off the same pole. Both premises are served by the same retail ISP. The ISP is an ‘unbundled local loop’ ADSL service provider, which rents the passive raw copper loop from BT Openreach. We can reasonably assume the properties of the physical copper are the same, as they were laid at the same time, and are terminating within feet of each other in some distribution rack at the local exchange.
The only difference between the two customers is that they have their active service supplied by different wholesalers. Thus any variability in performance is entirely down to how each wholesaler has constructed its active packet data service on top of the raw copper.
The data below is the one-way downstream delay, presented as a set of packet transit times, ordered by packet size. (If you want to read more about why this is a splendidly good way of viewing the data, read this presentation.) We’ve ignored charting the packet loss for simplicity of illustration.
The first customer’s retail ISP buys its wholesale path from BT Wholesale. This is what the delay structure looks like:
Yum! BT Wholesale gets a super-clean bill of health. (You may wish to clap now, in appreciation of their engineering prowess.) There is a lovely straight line, where delay (as expected) grows with packet length. There is little variation in delay caused by contention within the network. This high-quality ISP service is technically capable of running three concurrent VoIP calls of PTSN toll quality.
Now, let’s take a look at the performance of a commercial rival. This wholesaler is not BT. We’re going to be kind and not name and shame them, since these issues are endemic.
Yuk! You will immediately notice that there are lots of outliers of delay. This phenomenon of varying delay (and loss) is called ‘non-stationarity’. (Any of you sitting there screaming ‘jitter!’ lose five cred points, because jitter conflates variability in delay with both differing packet lengths and gaps due to loss.) All applications assume some level of stationarity for their control protocols to work. Real-time applications are the most sensitive to non-stationarity.
Because of this non-stationarity, this second ISP is only capable of carrying a single PSTN-quality VoIP call. In other words, if what you wanted to do was to run a small office with a two voice lines and a fax machine, you need three physically separate lines to get the equivalent service to the BT Wholesale offer.
The second LLU connection has (virtually) the same bandwidth, but potentially only one third of the value.
Why do we care about this non-stationarity? Surely we can do something about it with smarter end devices? Sadly, no, we can’t. Clever adaptive codecs use data from the past to control behaviour in the future. Non-stationarity de-couples the past and future states of the network. As a result, such codecs can oscillate and data flows collapse, because they (can and must) guess wrongly.
There is – in principle as well as in practise – no computational quick fix to compensate for this effect. You can try sending more packets to conceal loss (at a cost to all rival flows), but there is no such thing as ‘packet delay concealment’, since that means making time run backwards.
Since measurement is de facto regulation, consumers need the right measurements in order to make informed choices over supplier. At the moment regulators are busy pushing the whole broadband market supply chain (both retail and wholesale) towards delivering on their promises of advertised peak speeds. Unfortunately, this can have precisely the wrong impact on stationarity, and thus user value.
A simple example of this is that DSL lines are being driven closer to their absolute limits, which means they re-train line speed more often, causing brief interruptions to service. Increasing peak bandwidth also causes increased instantaneous load and contention effects, reduces isolation between users and data flows, and drives yet more non-stationarity. Indeed, it is my colleagues’ learned opinion that the current drive to get more ‘peak bandwidth’ out of ADSL is coming at expense of system stability and service reliability. This is particularly an issue for businesses with distributed home-worker staff who depend on continuity of service.
These non-stationarity effects are not at present being measured properly, either by operators or regulators. They certainly are not expressed in a form that users can understand, such as the number of concurrent voice calls the service could support. ‘Bandwidth’ is the wrong measure, since it is an increasingly weak proxy for the application outcomes that the users value. It simply fails to embody the quality-related aspects of the service. Bandwidth is not created equal, but the differences are opaque to buyers. Until this problem is remedied, the broadband business will follow technical and regulatory policies that are at best misguided, and at worst positively harmful.
To keep up to date with the latest fresh thinking on telecommunication, please sign up for the Geddes newsletter