Voice has joined the hypermedia revolution, becoming rich a data type. VoiceBase has taken pole position as the “Google of semantic contextual voice”.
VoiceBase operations room, monitoring throughput and latency of voice to actionable insight processing
Over the 14 years of doing blogs and newsletters I have rarely written about my consulting clients. It feels a bit like crass promotion of their commercial interests. However, they often do rather interesting things, and some of their stories are worth telling. So for a change I am going to tell you about one of them!
I have for some time been on the advisory board of VoiceBase, who are headquartered in San Francisco. They can reasonably make the claim to be the “Google of [hyper]voice”. My relationship with them arises from my work at the Hypervoice Consortium. We have long advocated “voice is data”, since there is an inevitable revolution coming as computers join us in conversation. (For a general introduction to enterprise voice analytics, see this No Jitter article.)
Whilst I was in the pleasantly sunny Bay Area a few weeks ago, I had the good fortune and pleasure to catch up with Jay Blazensky, who is co-founder and also runs their business development. His previous big “exit” was with RingCentral, so there’s voice industry pedigree here. It is no surprise that all the major cloud communications platforms are presently jockeying to build out their voice capability, so the timing of our meeting was most apposite, as those companies place their bets.
As background to this market, let’s just capture some basic concepts. The idea of hypervoice is that voice is not just a string of audio samples, which maybe can be turned into semi-intelligible text on a good day with a warm tailwind. My view is that voice is a rich, sensual, semantic and (most of all) contextual data type. Hypervoice is about those “hyperlinks” from the spoken word to its context.
Whilst “voice is data” is our hypervoice axiom, its ontology and syntax can come in many forms and formats, just as hypertext always has. In this case, VoiceBase specialises in long-form audio, like phone calls, webinars, lectures, and conferences. This contrasts with the short-form mumblings we all make to Siri and Alexa when we wish to be amused by how stupid computers still are.
There has long been an industry dedicated to extracting meaning from raw audio, with much of the funding coming from three-letter agencies, or bodies so secret they have no letters at all and are just a wink. Since first steps are fateful, the behaviour of legacy commercial players in this space has often been congruent with their spooky customer origins: unfriendly, with high prices, and inflexible offers (e.g. with fixed vocabularies).
This is now changing fast. Prices are dropping quickly, and the basic building blocks of speech analysis are being commoditised for general usage. This means that value has to move “up the semantic stack” to find bigger profit margins. Since information is (by definition) the unexpected, things like custom dictionaries being built “on-the-fly” now matter.
This is important because the “unrecognised” word in a general-purpose dictionary is the very important thing that the customer is trying to tell you about in their specific context. What we want is not low-value “speech to text” but “talking to meaning” and high-value “meaning to intent”.
And this is where a company like VoiceBase comes in. What they do is understand how what we say is linked to what we do in context. This is the core essence of hypervoice I and my colleagues have been saying for years: we need to understand how our stuttering utterances will lead to specific (business process) outcomes.
To achieve this goal, there is a process of classic machine learning training. VoiceBase builds context-aware predictive analytics systems based on business processes in an automated way. Will this customer actually buy, and is it worth spending longer on the call to close the deal? Will they make an appointment for a salesperson to visit? How at risk are they of churn, and what kind of retention offer should be made that’s appropriate to their needs?
This in turn requires analysis of thousands of factors to discover which are relevant to the business process outcome. Many of these are of the general contextual environment, such as a demographics, geography, and timing. The operational analytical process is then done on a “streaming” basis in near to real-time, to gauge sentiment and make a predicted outcome.
The impact of this technology is at two levels. The more obvious one is to create “lean” customer conversations with call centres, with a “war on waste and waiting”. Knowing what the likely outcome is as soon as possible eliminate pointless activity, lowers cycle times, and raises the quality of outcomes for all parties.
The less obvious one is that predictive insights can be used to drive new business processes, and redesign of the whole interaction and business model. For instance, if you are a credit collections agency, you can figure out who to call up and pester for payment, and who to totally ignore. It is not just a localised optimisation of each interaction, but also global transformation of flow of value through the business system.
To perform this analytical feat, it takes a major infrastructure to manage the flow of audio into user behaviour predictions. VoiceBase have spent years optimising the fast turnaround time, much like how a Google query has to get back to you quickly. These are distributed real-time applications being delivered across whole continents. You cannot fake infrastructure, and it takes a major engineering effort to get the performance and availability right.
Furthermore, this is a market enjoying classic J-curve explosive growth of 20-30% a month, which adds its own dimension of infrastructure challenges. Yet the total penetration of voice analytics is still tiny: of an annual global voice market of 500 billion minutes, under 500 million minutes are being indexed. It is a fairly safe bet to predict that this market is going to be enormous.
I see VoiceBase as part of a natural swing away from symbolic to sensual data. In our Hypervoice Consortium research we characterised this as the end of Information Technology as a paradigm, and the rise of a Human Technologyone. Voice has more in common with Wellness as a Service than it does with instant messaging; hypervoice is a subset of hypersense.
It may be hard for traditional communications platform players like Twilio to make this jump, since their origins are in symbolic SMS; they see voice as a “noisy message” with fiddly transport needs. To me, VoiceBase is more of a business process outcome view, Amazon-like in its focus of using its information infrastructure to “moving the needle” on business processes via timely choices.
Let me hypothesise for a minute on where this might take us, beyond the functionality we see today.
The enterprise analytics market has barely begun to explore the possibility of this new sensual paradigm. Just as the “cells” of spreadsheets define the intersections of “product” rows and “quarterly” columns of sales data, the “cells” of predictive analytics will define the intersections of verticals (retail, travel, utilities, etc) with customer lifecycle processes (product design, sales, support, etc).
That means a more appropriate lens through which to view companies like VoiceBase is the arrival of VisiCalc or Excel. Rather than simple scalar data in cells, business analysts and product designers can work with probabilistic data. They can use this insight to make informed long-range strategic plans, and near-term tactical decisions to execute them.
We’re fundamentally upping our “lean enterprise” game to a whole new level by including the human feedback missing in the symbolic IT world. Armed with new conversational analytics from machine learning, we can begin to implement higher-order business model change. So you can build new feedback loops, with new products based on insight from sales and support processes.
An example of this higher-order change in telecoms could be new prepaid products designed to cater to different credit management behaviours. But we could also use insights on credit behaviour from other verticals, or redesign sales processes to deliver better collections outcomes.
This is a breaking of the silos between business processes and vertical industries. The ethos of VoiceBase (and why I allow my good name to be associated with them) is that they are a data insight company. They are not a nerdy voice component technology company, nor are they a surveillance and identity harvesting company (you know who you are!). Voice is just the means; better customer services and user experiences are the ends.
That said, there is some geeky cleverness under the hood. VoiceBase have cracked a “neural network composability” problem. It’s a bit like being able to teach a machine to ride a bike and to eat sushi, without one skill compromising the other. (This is reminiscent of the composability of the ∆Q calculus in the network performance science work I am involved in.)
It’s a game-changer.
The immature nature of this market also hints at many ways in which it might grow and improve. For instance, at the moment the sentiment analysis in the VoiceBase system comes from looking at the spoken words following speech-to-text conversion. This can be strengthened by direct and native sentiment analysis of the voice itself. And who knows what other sensor data might be relevant to the mix: does it make a difference to the business process if someone is walking or driving?
Voice is also intimate biosensed data, which provokes ethical issues over how are we storing it, and who has power over it. We all face the “Roomba problem”, where an unthinking “I agree” has given some corporation control over your sensitive data (in this case, selling maps of your home to those surveillance and identity harvesting companies). There is a duty incumbent on companies like VoiceBase to help their customers to ensure ethical use of their technology.
That all said, I am rather pleased to see this market growing, and this company finding success. From a personal perspective, there’s a guilty gloating element of “I told you so!”: I and several colleagues predicted voice would be the “next big thing” when it was in the doldrums 5+ years ago. I am also pleased for Jay and his team, since they can look their peers and families in the eye and honestly say they have not only done well, but also done good.
The mainstream hypervoice market has only just begun, just like hypertext in the early 1990s. May VoiceBase be a positive example for the many more players to follow.
For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.