The telecoms industry has consistently failed to apply well known management theory, believing it is somehow a “special flower” with nothing to learn.
As my slightly perverse evening entertainment, I’ve been watching some tremendous YouTube videos of air traffic control recordings during emergencies. If you want to feel the true life drama, this one is a good starter.
Aviation is an extremely safe activity because it has an antifragile management system: every crash is investigated by an independent bureau. The pilot and ATC unions, airlines, and plane manufacturers can’t hide their failures. As a result, aviation learns from each accident, thus making all future flights safer. There is a standard taxonomy of things that go wrong.
In our virtualised telecoms transport world, fatalistic “experience crashes” are so common that we’ve even come up with a tragicomic excuse for the failure: “best effort” delivery (aka “shit always happens”). Each “crash” gets a one-off “hand crafted” investigation that doesn’t contribute to the wider knowledge pool. There are many good and competent people trying to solve problems in a bad system, and the result is a lot of unhappy customers and unprofitable products.
It doesn’t need to be this way: we just have to apply ordinary management theory to solve our “packet accident” embarrassment. So why hasn’t this been done already? Well, there is an implicit belief that telecoms is some kind of “special flower”. People are so absorbed with network mechanisms that they fail to think about management systems. Hence they don’t learn from established bodies of knowledge.
If you read various online groups, you’ll see lots of clever technicians showing off their knowledge of the peculiarities of Internet Protocol. What you won’t find is much discussion of general management theory, or the science and engineering needed to enact it in our context. We’re too special, you see. And to a limited degree, telecoms is unique, being the industry that moves at “light speed”.
For example:
- Our control systems are “ballistic” (move like bullets), not “elastic” (like a production line). We can’t send messages to the other end of our “experience factory” faster than the “product” itself moves.
- We’re in a computational context, which means we can create unbounded demand at no cost. We have “customers” who are algorithmically duelling for resources (a bit like high-frequency trading in finance).
- Packet networks have two degrees of freedom: there’s load, loss and delay. Pick any two, the other is set for you. The temptation is to treat loss as a fault, and fit everything into one degree, so as to simplify everything.
But exceptionalism is true of every industry: we all have unique features. Yet management degrees exist, and managers move between industries, which suggests our commonalities dominate our differences. Ultimately, telecoms is nothing special. We just continuously manufacture distributed computing experiences, much like any other service-centric industry.
How could telecoms learn from other industries and their management systems, in addition to aviation safety? Well, here’s a list to get started with:
- Supply Chain Management: We’re missing the whole resource planning function from our orchestration, and we don’t have meaningful “delivery contracts” at the management and administrative boundaries.
- Total Quality Management: We’ve can’t measure quality in an experience-centric way, so we can’t even tell when there’s a “defective vehicle” in our equivalent of the Toyota Production System, and stop the production line to fix it.
- Just In Time: We haven’t defined what “in time” latency means (i.e. what acceptable quality outcome we desire), so we mostly over-deliver (bloating costs), and sometimes under-deliver (creating churn).
- Six Sigma: OMG! We’re hardly “one sigma”! How much variability do we have on the same broadband product in terms of “speed”? This isn’t innate: predictable engineered outcomes are perfectly possible.
- Theory of Constraints: Every system has a bottleneck, and sometimes we’ve tried to rate limit data coming off backbones to the mean line speed of the customer’s access product, but there’s no quality in averages: you have to get the bottleneck arrival pattern right too.
- Lean: We use “work conserving” queues that maximise “work in progress”, when we should be using non-conserving ones to minimise it. Our industry then wonders why it has “bufferbloat” and bad QoE.
- Kanban: We can’t properly visualise our “work” because we’ve not separated out static from dynamic effects in our metrics. You can only manage what you can measure!
Vanguard method: Networks are full of “failure load” (e.g. retransmits), but we don’t work to identify and eliminate it. A nontrivial amount of network traffic has negative value.
That’s just a quick list off the top of my head, and I’m hardly a management guru. We urgently need to get the MBAs and network scientists working together, so our “packet traffic control” can let our information “fly safely”. This gets even more urgent as we move to 5G and SDN, which make everything far more dynamic. Suddenly you have Concordes trying to land among the turboprops!
Until telecoms industry leaders “learn how to learn”, we’ll be left amusing ourselves with dramatic recordings of the endless network signalling mishaps and user application tragedies.
For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.