I have been working with several clients in the SDN/NFV space, all of whom are trying to make sense of the transformation to the ‘software telco’. The challenge they all face is that our future distributed computing needs are qualitatively different in scale compared to the past. That in turn dictates that a whole new skill set must be learnt.
As a result, we are in the midst of an industry transition similar to that from mainframes to minicomputers, or PCs to smartphones. As with those previous disruptive transitions, it will open up great opportunity to those who quickly adjust to the new reality.
That new reality is ‘ultracomputing’. A brief introduction is offered below.
Networks are large-scale distributed parallel supercomputers
In supercomputing, you have a large number of interconnected nodes involved in computation and communication. They simultaneously work on many inter-dependent problems. The system must remain stable at all loads, produce outputs within bounded timeframes, and be resilient to component or process failure. The same requirements are being placed on telecoms networks.
However, in telecoms networks the relative costs of the component computation and communications technologies continually vary. Furthermore, the interconnection between these functions can no longer be assumed to be carried over dedicated circuits, as all traffic is now over a common statistically shared transmission medium. The cost structure and performance of the transmission can vary from one territory to the next, as well as dynamically over time.
As a result, the optimal location of each function in the distributed architecture also can change. The performance is specific to each network configuration, rather than generic protocol behaviour.
Ultracomputing thus demands a new discipline: the performance engineering of complete large-scale dynamic distributed architectures. Critically, this is distinct from the engineering of any of the sub-components.
Finding the optimal trade-offs
This optimal location of any function in an ultracomputing environment depends upon both the desired customer experience and total cost of operation. The customer experience depends on the quality of experience (QoE) performance hazards; the total cost of ownership depends on the cost of mitigating or addressing those hazards, and the level of financial predictability that results. The ultracomputer has to enable the appropriate resource trade-offs using a distributed resource allocation model.
This plays out differently for each part of the mobile ecosystem:
- For MNOs: where to place caches, radio controllers, or internet breakout.
- For the content distributors: where to place delivery systems, when/whether to use multicast, where to place transcoders (from centrally down to every set top box).
- For cloud service providers: where to place the application functionality – how much is local, and how much is remote, given that functional splitting increases implementation complexity.
In the ultracomputing world, the design space is now large, irregular, and involves interactions of sub-systems from many vendors. The current virtualisation trend has magnified the issue over how best to allocate resources. Once a function can be located in many places, the total number of combinations becomes too high to test and validate empirically before deployment.
Ultracomputing demands the ability to model and manage these trade-offs, which occur at all timescales from design to configuration to operation.
The ultracomputing skill set
In ultracomputing you need to be able to perform the following design and engineering activities:
- Reason ex-ante about complete systems, and the interaction of all their sub-components.
- Understand and model the predictable region of operation and their failure modes under load, so as not to cause localised or widespread failure.
- Understand how finite communication and computation resources are constrained by both capacity and schedulability factors, and model the complex range of interactions against these two constraints.
- Know whether demand can be scheduled to get the supply resources it wants in the timescales it requires.
- Know how to construct your service offerings to give the maximum flexibility to schedule supply.
- Manage both the resources of the external user processes, as well as the internal communication and coordination resources, which are all multiplexed together.
- Allocate resources for all the above using a coherent distributed resource management system.
Regrettably, the telecoms industry has yet to conceptually and practically grasp these issues. Yet the application of known mathematical and performance engineering techniques can resolve the technical problems: how to decompose the system, understand the trade-offs being made, optimise for specific cost or user experience outcomes, and operate these complex systems with a high degree of predictability.
If you are grappling with these issues, do get in touch. I and my network performance science colleagues can help.
To keep up to date with the latest fresh thinking on telecommunication, please sign up for the Geddes newsletter