In the not too distant future, intelligent machines may join us in all of our voice conversations. The early steps towards re-imaging the phone call include technologies like call recording and speech recognition. The idea of hypervoice captures the maturing of this trend of computer-enriched voice communications.
I was recently given a demonstration of a new hypervoice application, Verbol voice, developed by Family Systems Ltd. What caught my eye is their novel approach to the problem of multi-party voice interaction.
The core of the interface is a timeline of ‘utterances’. These are continuous chunks of speech recorded from a single speaker. The timeline is coloured to help visualise the speakers and the length of their contributions. What is unusual, indeed unique, is how participants in the conversation relate to this timeline.
In a ‘normal’ phone call, all participants are locked in synchronous interaction in the present moment, i.e. at the leading edge of the activity stream. We have seen applications like voice messaging and push-to-talk complement this with asynchronous or semi-synchronous voice communications. The smooth transition between these modes was finely redesigned by Talko, which was recently acquired by Microsoft.
Verbol effectively creates a whole new mode and category of interaction. It doesn’t even have a name to describe it. The participants don’t all have to be at the ‘front’ of the timeline, and you can see where others are in the timeline.
Interaction is modal, in that there are ‘conference’ and ‘chat’ modes. The former plays back utterances, feeling somewhat like a ‘normal’ phone call. However, by using the controls, or merely talking over someone, there is an automatic switch to ‘chat’ mode. This allows ‘side conversations’ to happen on parallel ‘tracks’. Indeed, it is possible to (re)join in conversations and reply (via voice), a bit like threaded email.
The replay of the utterances is also different. There is a ‘speed catch up’ mode, which is faster and with ‘dead space’ stripped out. Their experience found that 130% playback speed typically preserves the ability to understand what has been said with undiminished comprehension. For skimming, you can replay at up to 180% speed. The voice tone is adjusted to compensate for the speed increase.
Finally, the system is able to manage each speaker’s voice data separately, and to create your own managed instance of the system for security. This makes a welcome change from your average NSA-friendly unencrypted phone call.
The overall effect of these changes takes some getting used to. It’s a bit like hearing jazz for the first time after only having known classical music. Verbol voice offers a new paradigm for voice interaction. It makes sense once you use it and grasp the difference. The development team use it themselves to coordinate their efforts.
The visual presentation of Verbol voice studio may not (yet) have the gloss of other innovative applications. Yet the underlying ideas and functionality are fascinating to me, not least because the degree to which voice can be re-imagined is broader than generally supposed.
Verbol voice surfaces the assumptions of the phone call by breaking them, and for that alone it deserves wide attention.
If you would like a demo you can get in touch with Bryan Reynolds, founder of Family Systems Ltd, at br@verbol.com.
For the latest fresh thinking on telecommunications, please sign up for the free Geddes newsletter.