In this final part of my interview with Pete Cladingbowl we examine what the future might look like once the telecoms industry finally emerges from its ‘lean’ revolution, and masters the management of flow.
The lean and anti fragile data centre
Part 4 – A new intelligent network emerges
If we don’t adopt new “lean” and TQM techniques for the Internet, what will happen?
We are already seeing lots of problems with the Internet being used to access cloud applications. Users can’t tell which broadband or cloud service is right for their needs. When things aren’t right, it is very hard to isolate faults. Service providers can’t determine operational tolerance levels: they don’t know if they are under-delivering quality, or over-delivering.
There are several underlying causes. From a technology perspective, we have problems of measuring and managing flow at short timescales. These “high-frequency trading” issues particularly affect high-value transactions and applications.
We are also riding on the back of networks that connect clouds together. The way that the Internet is designed, with a single address space, means these cannot function well enough. This is especially true with video sucking more resources up. We also have a lot of computing power in the wrong place: if you use Facebook in Mongolia, it is served from Dublin.
From a commercial viewpoint, service providers don’t have enough skin in the game, so don’t feel the pain when their customers’ applications aren’t secure or don’t perform. They lack the necessary understanding of the relationship between the technology and customer value.
Because of this, telcos use contract terms to hide their engineering failure. Breakage of the service level agreement is treated as a cost of sale.
What do we need to do to improve the situation?
When you move your applications to the cloud, we need to share whatever resources we have in the network and data centre. At the moment we can’t multiplex it all as we’re so poor at scheduling. To improve the situation, we have little choice but to schedule resources better. The core idea of “lean” is to manage flow to meet customer needs in a sustainable manner.
We simply can’t keep throwing more and more capacity at scheduling problems, or build a CDN or overlay network for every application. Our current path results in unsustainable economics for the cloud. Data centres are already overtaking aircraft as one of our biggest energy consumers. The universe we live in doesn’t scale the way people assume and hope.
We leave the customer experience to chance, and then get bitten by frequent “black cygnets” and beaten up by occasional “black swan” events. The result of our engineering failure is a whole industry sector dedicated to cleaning up the mess.
For example, WAN optimisation is a $20bn industry that should not exist. They are doing the traffic scheduling and shaping that operators themselves failed to do. The cost of enterprise application performance monitoring runs into billions, much of which should not be needed.
Twenty years ago we saw the “rise of the stupid network”, and that now needs to go into reverse. We need a new kind of network where there is AI inside driving resource scheduling. This is focused on delivering user outcomes, and making the best use of scarce resources.
If we do adopt new approaches, what might they look like?
What needs to be different is clear: we need a demand-led approach that applies the quality management principles established in other industries. We need to think about flows, resources, supply/demand balancing, bottlenecks and trade-offs.
We also have much to learn from the military who have Information Exchange Requirements (IERs). There need to be tighter flow “contracts” between the supply and demand side of both telecoms and data centre services. You might think of this as the “Intercloud”, where IXs moving from peering connectivity to providing managed performance along supply chains.
For service providers to move forward they must define the user experience they aspire to support. This means being fit for purpose: doing the failure modes analysis, managing and mitigating the failures, and having the right service level agreements to restore service. They need to better understand what is genuinely under their control, versus being third party, and whether the latter can be trusted.
As enterprises and consumers move to the cloud there has to be a more robust due diligence process for application deployment. Buyers need new and better cloud comparison services to help them select the right computing and communications offers, and to configure them to deliver the desired outcome.
The “smart network” (or data centre) of the future will also reflect broader trends towards machine learning. We have clever machines playing chess, or getting close to passing the Turing Test to emulate human conversation. What tends to grab the headlines are these examples of application-level machine learning. Your phone might know if an email is worth vibrating for, based on context, location, calendar.
However, AI is a whole host of things, often mundane. Rather than “supercomputer AI” pretending to be a person, we will have different types of AI for different circumstances. It’s not like Skynet, even if that is a worry; it’s more like the “fuzzy logic” in your washing machine, or a smart fridge that knows usage patterns, and reminds you to buy milk on the way home.
The network of the future will also look at patterns, and those will drive different choices. There will be network-driven UX intelligence – say “smart hunt groups” in the context of voice, or detecting suspicious behaviour around your home. A lot of the application of machine learning will be for ordinary things like packet processing. What autocorrect does for your typing, the network will do for class-of-service assignments for data flows.
What are the potential benefits of a new “lean” approach to networks and data centres?
There is a tremendous opportunity for organisations to collaborate and coordinate better. We can radically improve value flow internally, as well as between customers and suppliers. The result is greatly reduced waste throughout the whole economy.
The future of digital supply chains will reflect the way we have built more sustainable physical supply chains. Principles of “just in time” and “right first time” were enablers of new “pull” systems; these have yet to be applied in telecoms and data centre design.
These concepts came out of the experience of the 1950s. There was a huge surplus production capacity in US from wartime, and industry shifted to making consumer goods. Back then, enterprises had a pre-stocked product to sell, so had to create ads to stimulate demand to sell the goods they had (over-)produced.
Telecoms today is very much a “push” industry, with similar supply-side economics. The core belief is selling more capacity is the route to progress and growth. What should be driving growth is a balance of flow between supply and demand. When we balance flows, the supply chain becomes more sustainable.
Ultimately the limit we face is energy flow management, and a sustainable resource model for a finite world. Ideas like “net neutrality” work against optimal flow, demanding unbounded generation of input resources. The constraints of the world force us to face up to issues of sharing and scheduling resources better.
If we solve these flow problems, we can begin to re-imagine what kind of society we would like our children and grandchildren to inhabit. There is a real possibility of using technology to achieve fuller employment and better life. By reducing the impact of networks and data centres on the planet and its resources, more people can specialise in whatever bring joys to other people – be that growing bonsai trees, or surfing at the beach.
You can also read:
Part 1 to follow: A new intelligent network emerges
Part 2: The need for antifragile engineering
Part 3: A new Internet architecture & politics
For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.