When I walked in to SpinVox‘s plush Buckinghamshire offices this morning, flanked by the Register‘s Andrew Orlowski and Ben Smith and Dan Lane from The Really Mobile Project, the tension in the building was obvious. There were nervously exchanged glances and bad jokes from senior staff. A smartly-dressed James Whatley eyed me reproachfully. But the guys managed to hold it together for long enough to usher us in to a conference room and ply us with pastries.
We were not asked to sign an NDA, but we were asked not to record anything that happened in the room. Ironic, really – and the reason that Ewan MacLeod from Mobile Industry Review declined the invitation.
CIO Rob Wheatley took us through a technical explanation that, while honest about the existence of human agents in the process, didn’t give away as many secrets as he made out (between the four of us, there wasn’t much we didn’t already know), before leaping to what we all came for: the demo.
The big technical question surrounding SpinVox – the one they refuse to answer (as they did again today) – is what proportion of the messages they process are seen by a human being. It’s the one sticking point that has fascinated journalists and customers alike. But SpinVox are staying quiet: all they’ll say is the proportion varies from country to country and from carrier to carrier.
So what happened in the demo, and what can we infer from it about those proportions?
The demo was performed in a standalone test environment, which had only four processing cores – as opposed to the main system’s 800 or so – and was not connected to the wider network. I saw no evidence that what we saw was “a set-up” or “prioritised” demo and I have no reason to think it was (you’ll see why in a minute).
We began with a short, simple message, read by Rob Wheatley himself and called in from his own BlackBerry. The system spat out a perfect text version in a few seconds. Next Wheatley left something a little more complex. A few sentences this time. Again, a perfect and speedy result. But then, both messages were straightforward and they were left in a loud, clear voice at a leisurely pace in a quiet room. You’d have been worried if the system hadn’t got them right.
It was then my turn to try. I left a message, at a brisk speed, that included my full name, the word “TechCrunch” and an invitation for the “recipient” to call me back. I believe that the message was a reasonable and realistic approximation of a real-world message, albeit with a few strange words in it. The SpinVox system failed to convert the whole message – ok, so most humans can’t spell Yiannopoulos – and passed it to a human “agent” (who was sitting in the room with us).
Here’s where it got ugly. From observing the “tenzing” process in action, it was clear to us that the system had failed to pick up a single word in the message correctly. The agent in the room had to listen to and manually type the entire message, from beginning to end. SpinVox has previously claimed that agents do not get to hear entire voicemail messages; only enough to give context and enable transcription. That’s not what I saw this morning.
Spinvox’s people were quick to point out that British English is actually SpinVox’s worst performing language. According to them the system is much better at US English and Spanish.
But if all we have to go on is today’s demo (given SpinVox’s refusal to give any indication of how many transcriptions involve human agents), then it’s hard to escape this implication: that the vast majority of messages left in real-world conditions (like beside roads and in cafes) and containing more than “Hi Jim, can you call me back? Cheers, Bye” are processed to some extent by a human being.
The aim of the day had been to show us how the technology works. First of all: it didn’t, beyond transcribing a simple message in a quiet room. But secondly, and more importantly, that’s not actually what people want to know about any more: since SpinVox refuses to go on the record about the level of human involvement, the media will be left having to continue to speculate about that number, and no doubt investigating it as well.
The sorts of question we did want to ask included:
In fact, that last question was asked by Orlowski during a brief but fiery cameo appearance by CEO Christina Domecq, during which she revealed the new cash injection (again, I wish we’d not been prevented from recording video). Domecq obviously didn’t appreciate the question, and retorted angrily that SpinVox is spending “much, much less” than that per month. She added that the company expects to break even this month and become cash-flow positive very shortly after that.
What else did we learn today? For one, that SpinVox is taking security very seriously these days. ”As we’ve matured as a business,” said CIO Rob Wheatley, “Our relationships have matured. Our QC houses [the third-party processing centres contracted to process SpinVox's messages] are very professional environments in areas where it’s seen as a very good job to have. Our agents are very proud of what they do.”
Proud, perhaps, but apparently not proud enough to be trusted with the Web: we were told that agents’ computers have no web browser (in fact they have no software installed besides the tenzing application and an anti-virus package) or USB ports; agents cannot take cameras or phones into the office; they have to wear ID and uniform at all times and there are background checks into all recruits.
Ok, that’s reassuring. After all, they’re listening to our voicemails all the time so you’d hope they couldn’t just email their friends the contents or post them online.
But it’s clear: Although the Spinvox denies it’s in trouble and says it is poised to break even, it’s still burning masses of cash, hence the latest injection. And it must surely be clear now that vast majority of messages are seen by human eyes.
So what does that suggest? It suggests that after five years of operation, after processing 130 million voicemails, Spinvox can only handle relatively simple messages spoken in quiet rooms.
And they have not reduced their call centre operation in the last 5 years as the system got “smarter”. If anything they’ve scaled up their call centre operation to deal with the contracts they’ve signed with carriers.
If they were a normal call centre business, a cash business, then that would be fine. But this is a company that effectively claims that at some point, as their voice recognition gets better, the human element will be substantially reduced and the VCs will be rewarded with a business which scales massively. That was not what was suggested by today’s demonstration, which ultimately calls into question the entire Spinvox model.