I'm off to SpinVox HQ! What should I ask them?

Next Story

World's first USB 3.0-capable motherboard gets nixed

On Tuesday next week, along with a few other journalists and bloggers, I’ll be visiting SpinVox HQ in Marlow, Buckinghamshire. SpinVox will be demonstrating their technology to us and they will be taking questions about the recent scandal surrounding the company’s speech-to-text technology.

Here’s the invitation I received from them today:

How Does SpinVox Convert Voice to Text?

I know you are interested in finding out more about how the SpinVox Voice Message Conversion System (VMCS) converts millions of voice messages to text, so I am pleased to invite you to our Marlow Headquarters for a technical briefing where you will see the VMCS in action and get to try it for yourself.

You’ll get the chance to speak a number of your own messages to see how they go through the automation system. You’ll see how, as the messages get increasingly complex, the system might refer them to a Quality Control (QC) agent for checking or completion to ensure that they meet our quality standards.

During the demo you can get hands-on, acting as a QC agent, so that you can see both sides of the process. You will get first-hand experience of how the application works and how quickly the messages go through.

This is the first time that SpinVox has offered demonstrations of its VMCS and we hope you’ll take the time to come and see it. We’ll be running the introduction to VMCS and the demos on Tuesday 04 August, from 10.00-11.00am and from 11.30-12.30pm. In addition to the demonstration, there will, of course, be an opportunity to ask questions.

So I’ve got two questions for TechCrunch readers:

(1) What would you like me to ask the SpinVox bosses? Any burning questions?

(2) What should I feed in to the machine to be converted?

Leave your questions and suggestions in the comments below.

Update: I just spoke to Rory Cellan-Jones on the telephone and it seems his invitation was lost in the mail. But Rory’s not too sore about it: he’ll be in Mallorca anyway.

  • http://www.armintalic.com/ Armin Talic

    1. Are you going to let me leave?

    2. To the person transcoding this voicemail, I love you. Hang in there.

  • Rodolfo

    Show me the money (as in how come you claim you raised 200m and then you didnt?)

  • anonymous

    >”You’ll see how, as the messages get increasingly complex, the system might refer them to a Quality Control (QC) agent ”

    CRITICALLY IMPORTANT… For credible test, I think that most sample voice messages should be LONG (e.g. 2 minutes), but NOT COMPLEX.

    ** SHORT simple messages are easy to fake by a human (Mechanical Turk), and they are probably easiest to convert automatically.
    ** COMPLEX messages of any length would be expected to require diversion to a human operator.
    ** But I’d guess that long and non-complex messages are the real test of automation.

    For a long message with simple common words, I’d think that a world-class court stenographer or even the best human agent chosen from the thousands SpinVox has trained and performance-measured around the world, shouldn’t be able to match the TAT (turn around time) benchmarks of a fully automated system.

    Here are my assumptions: Logically, we might assume that it is no harder, and perhaps easier, for a system to convert a long message compared to a short message. Furthermore, for an automated system that is designed to convert large volumes of messages, it is logical to assume that a long message shouldn’t take much longer for an automated system to convert than a short message.

    But those are my assumptions. If necessary, SpinVox should explain why LONG, NON-COMPLEX messages are not automatically converted almost instantly in languages, such as English, for which the automated “learning” system should be relatively mature.

    I think the TAT is the key measure to verify claims of percentage of messages converted without human assistance, but it would also be interesting to note the quality of conversions by fully-automated versus human-assisted processes.

    And it would be interesting to see, for messages that do get diverted to humans, if and how SpinVox protects privacy and security when content is sent to foreign call centers (by dissecting messages into small message snippets? or masking personal information spoken into the message? or whatever other technical means it might use).

    And while you wait for the conversion, just sing the lyrics of “John Henry”

    Don’t forget to bring your stopwatch!

  • http://www.iknowthe.net Emma Kane

    1) How’s the security in the Egyptian call centers?
    2) I see a little silhouetto of a man
    Scaramouche scaramouche will you do the fandango
    Thunderbolt and lightning very very frightening me
    Galileo galileo
    Galileo galileo
    Galileo figaromagnifico
    But Im just a poor boy and nobody loves me
    Hes just a poor boy from a poor family
    Spare him his life from this monstrosity
    Easy come easy go will you let me go
    Bismillah no we will not let you go let him go
    Bismillah we will not let you go let him go
    Bismillah we will not let you go let me go
    Will not let you go let me go
    Will not let you go let me go
    No no no no no no no
    Mama mia mama mia mama mia let me go
    Beelzebub has a devil put aside for me for me for me

  • jacob

    when they gonna let me tap that

  • http://vocalnews.info ppxppx

    You could also ask why nobody knows nothing about the fathers of their ASR …
    If it’s so good, why don’t they claim paternity ? Why any university, any software company, knows something about those who program the engine ?
    And ask on the TENZING application, with the foot pedal …

  • Connor Sweetman

    I would start by asking them to unplug all the routers & network connections and do this demonstration in a bare bones room or even the hallway. Ask them to start the demo of “The Brain” to work in a standalone fashion. Don’t let them tell you that they need “The Brain” to be fully operational due to client demands. They must have more than one!

    Then have the reporters speak into a microphone and have the converted text displayed on the screen. Sure this would be like one of the original “Naturally Speaking” products by Dragon, but it would be definitive proof that they do have a Computer System capable of converting human voice to text. Also make sure that they don’t ask you to be silent while others are speaking because when we leave a voice message we generally cannot control the “world around us”.

    Background Noise is one of the things that must be addressed to support their claims.

    Next Step:
    Have them hook up “The Brain” to ONE incoming and ONE outgoing line that connects to a cellular carriar that will receive and send your voice mail out as text.

    Spend some time this weekend going around with a Voice Recorder and ask people to say things into the recorder. Make sure that the people have different accents and inflections. Let them read passages from a popular novel if they can’t think of anything to say. During the demonstration, play what you have recorded into your cell phone as a voice message and see the TAT and the accuracy. Remember, bring an accurate stop watch or have a wireless laptop connected to an atomic clock site for accuracy.

    And MOST IMPORTANT, make sure that they aren’t just connecting you to Ireland. Remember it has been written on blogs that there is a VIP Queue that goes directly to Ireland to *impress* potential clients and then they switch your conversion to *other countries* where English is not their first language.

    Seriously, the first step HAS to be, DOES THE BRAIN REALLY WORK? So if they don’t agree to a demonstration that starts with a Closed Standalone System, WALK OUT!

    Once “The Brain” has proved that it does in fact do speech to text conversions and you are really excited about the percentage of success it has, then ask to use it with the Servers connected and use a more difficult voice spoken passages, again with different inflections, and accents. And make sure you try to NOT let them send your messages to Ireland for converstions. Bring a Cisco CCIE with you and have them do a trace analysis on the packer headers and trace the IP addresses of the packets. Ask that the CCIE our your choosing, be allowed to check all their routers for priority queues and access lists. Beware of hidden microphones on site where somebody could be transcribing before you are even finished with your message.


  • Jeff

    Sign up with a carrier that is using spinvox before heading there. Then when it is time to demo, call that number to leave a message, not the one they give you in the demo room. They have a system that can route messages to a higher priority queue. That’s a fact Jack.

  • http://speechanalytics.blogspot.com Ofer

    While trying to position themselves as something automatic, it is manual.
    Note that for automatic system, there is no need to raise $100M.
    See http://speechanalytics.blogspot.com/2008/04/nuance-jumps-on-voicemail-to-text-wagon.html for more details.

    First qustion is how many centers they are using around the world and how many people are employed there. From the number of employees you can understand how many messages are being transcribed manually.

    I suggest for a meeting to ask them about the differentiation between them and jott and google voice. I suggest to provide the system with a data in a different language and high accent. The problem with automatic system is that if they are ok at 99% still you will have enough customers not happy from lousy transcription so for quality purposes you must review all the messages and thus have privacy issues.

  • http://www.facebook.com/people/Tom_Allason/510990 Tom Allason

    ask them why it is that conversion quality has been decreasing. the quality of conversions was *much* higher 2+ years ago.

    IMHO this is either a function of offshoring their call centres and transcribers with English as second language… its either that or the “brain” is doing more of the conversions. either way it isn’t good.

    i would be interested to know how they measure accuracy of these conversions. i for 1 have been getting frustrated but still not complained. eventually i will just cancel my subscription. i will certainly do so if google voice’s service is equivalent. (speaking of which- how are are they going to justify charging consumers if Google do it for free?)

    perhaps worth asking them what proportion of messages customers call in to listen to. if their conversion quality is very good that should be a very small number.

  • http://speechanalytics.blogspot.com Ofer

    Also – ask them about their patent application:

    It specifically refer to the human transcribers.


    Method of providing voicemails to a wireless information device


    Voicemail is received at a voicemail server and converted to an audio file format; it is then sent or streamed over a wide area network to a voice to text transcription system comprising a network of computers. One of the networked computers plays back the voice message to an operator and the operator intelligently transcribes the actual message from the original voice message by entering the corresponding text message (actually a succinct version of the original voice message, not a verbose word-for-word conversion) into the computer to generate a transcribed text message. The transcribed text message is then sent to the wireless information device from the computer. Because human operators are used instead of machine transcription, voicemails are converted accurately, intelligently, appropriately and succinctly into text messages (SMS/MMS).

  • Tor Ellingsen


    What if you want the message translated- for instance from english to spanish- that would be cool and similar to on feature in Google Wave?



  • Tor Ellingsen


    What if you want the message translated- for instance from english to spanish- that would be cool and similar to one feature in Google Wave?



  • Yoda

    1. Any demo you see will be carefully rigged to show you what they want you to see, whilst appearing to be live.

    2. Everything they tell you they will have spent the last week carefully crafting.

    3. Every number and figure they give you will be based on a context that only they are aware of and in most cases don’t want you to be.

    4. Every person you meet will be hand picked by the senior management team, and they will all be briefed on what they can/cannot say.

    5. More than likely they will make you sign an NDA, and they will want to vet whatever you publish.

    • http://uk.techcrunch.com/author/milo-yiannopoulos/ Milo Yiannopoulos

      I can assure you that SpinVox will *not* have copy approval over anything I write.

  • Greg

    Say the following:

    “This is a message for David Hare – that’s H, A, R, E”

    Expect the surname to come back perfectly spelled, but without the word ‘that’s’ or the letters.

    In other words, a simple ‘Turing Test’ is to tempt a human into processing the semantics of metadata in your message. A computer is more likely to transcribe the syntax literally.

    Once managed to get a smiley :-) inserted in the message by asking nicely.

    Also ask them why they are so ashamed of the human-based process in the first place.

    • Anon

      The transcribers are drilled from day one never to interpret the message and to put down whatever is heard – this won’t work

      • Aardvark

        Actually Anon, Greg’s strategy DOES work. The transcribers often slip up because they think like humans not machines (which is why it is so painfully obvious that “the Brain” doesn’t exist and SpinVox is a mechanical Turk exercise). My favourite example: I left my wife some random test message asking her to do something (e.g. “Can you turn off the oven on your way out”). The transcription came back as “He wants you to turn off the oven on your way out”. No ASR machine would be capable of doing that…”the Brain” is vapourware…

      • Greg

        The object of the exercise is to see whether there are humans involved. By saying “it won’t work since the humans are well trained”, you kind of collapse the purpose ;-)

        Anyway, I think there’d be second-order effects to detect.

        If it’s all humans, I’ll expect 100% ‘HARE’. A machine will doubtless flip to ‘HAIR’ a few times – since it’s a far more common spelling of the same phonetics.

        All good honest fun. More interesting is the mystique they felt they needed to conjure up.

      • Aardvark

        Sorry Greg – but it’s not my fault if we already know the answer to the exercise in advance. The tests have already been done :)
        As a follow up, you seem to be assuming that humans are better spellers than machines…. I think it would be very surprising if we got 100% HARE from human transcribers :)

  • exspinvox

    ask to go upstairs to see the ‘back office’ especially the language managers and how they measure the message quality – i can guarantee you any message ‘conversion’ that you see demo’d is either transcribed by someone in the back office or in ireland (still humans by the way)

    if somebody could get hold of a tenzing manual to wave at them during the meeting that would be legendary stuff. be interesting to see how they react to that

  • anon

    As a control test, whatever messages you use internally, ask some people externally to send exactly the same messages at around the same time via the normal service and see to what extent the results vary. Don’t send to or from your phone as your number may be routed to be treated specially

  • anon

    How about the number of users they claim. Seems to me they make it up day by day.

    30m – 100m are those all “active” users? Are they paying? Even at a few cents per message their revenues should be higher than reported.

  • anonymous

    Please ask SpinVox to clarify its definition of “user” and “customer”. It has chosen to use these terms in its public communications when providing evidence of its business success and evidence of its technology (automation) success.

    >> SpinVox Blog: “over 30 million live users and will service over 100 million by the end of 2009” http://blog.spinvox.com/category/spinvox/
    >> Christina Domecq: “where we used to need 5,000 agents for the first 1m customers, we now need fewer than 100 agents per 1m” http://business.timesonline.co.uk/tol/business/industry_sectors/technology/article6735993.ece

    These claims are meaningless and possibly deceptive if nobody outside of SpinVox knows how it defines these terms. On Tuesday, they should be asked to explain clearly, consistently and unambiguously how they define these terms, and any other terms that they use in fact claims.

    Question 1. Who is defined as a USER? Does SpinVox count one messages as having 1 user (the message creator OR the mailbox customer) or double-count as 2 users (the message creator AND the mailbox customer)? Logically, from a from an ASR technology perspective, isn’t a USER legitimately only defined as the person who’s voice was recorded? But from a business model perspective, isn’t it only revenue generating CUSTOMERS that matter? Which is it?

    Question 2. What is the meaning of LIVE user? That’s strange word to choose; why not use the more familiar and plain meaning term “active” user? I can understand how “live” can apply to the message recipient who have the service activated on their mailbox, but what does it mean to be a LIVE user, if by user you sometimes mean any person who ever spoke a message converted by SpinVox? (Years after the message was recorded, how do you know if they are still live or dead ).
    SpinVox seems to be very loose with the verb tense when talking about whether the 30 million number refers to CURRENT active users or the CUMULATIVE sum of users since the launch of its VMCS. For example, on page 5 of its new White Paper, it claims that is “has” over 30 million users, but on the previous page, it claims “over 30 million users activated from launch.” Which is it?

    Question 3: If you are counting the message creators as “users”, are these UNIQUE users? Or does a person get counted as a new person every time that they record a message converted by SpinVox. It is a bit spooky to think that SpinVox claims there might be dozens of users still LIVE in my SpinVox mailbox, including that awful robocall from Sarah Palin from October, and that potty-mouthed obscene caller who left me a message three years ago.

    We need a clear and consistent definition of “user” so that we are sure that SpinVox is not comparing APPLES to ORANGES when for instance it claims “we used to need 5,000 agents for the first 1m customers, we now need fewer than 100 agents per 1m”.

    APPLES scenario (5000 agents per 1m active users?): The TimesOnline article cited above reports that SpinVox has some 15,000 UK users that get the service directly from SpinVox, not through a carrier. One can guess that these users are pretty highly motivated to sign up in this way and that they might be relatively high users/communicators. Therefore, when Domecq claims an old agent:user ratio of 5000:1m, is she referring to this relatively small pool of UK mailbox owners?

    ORANGES scenario (<100 agents per 1m potential customers): Compare this group of “users” to the way that SpinVox defines customers in the carrier context: “19th August 2007 – Alltel: 12m Customers Get SpinVox” http://www.spinvox.com/announcements.html?wp_month=8&wp_year=2007&wp_day=false&wp_start=0 In this case, the company defines “customer” as anyone who could be offered the service by their carrier. If customer is defined as the entire customer base of a carrier, and only a small minority end up actually using the service on a permanent basis, then it is not surprising that SpinVox would only need “fewer than 100 agents per 1m” even if the service involved only low levels of automation.

  • anonymous

    I posted a long message about an hour ago about how SpinVox defines users. The post was apparently deleted by a moderator. I think the post asks some legitmate and important questions.

    SpinVox claims “over 30 million live users” and “we now need fewer than 100 agents per 1m”. These claims are hard to evaluate, if we don’t know the definition of users.

    Are these unique users? Are they “active” users, and by what definition, or is a cumulative number since the launch of the VMCS database a few years ago? Does 30 million number count only persons recording a message (the relevant number for discussing SpinVox ASR technology) or count only the mailbox owners who are the customer of the mobile operator (the relevant number for discussing SpinVox business model), or both?

    Milo, if you understand the answer to these questions, I’d appreciate your writing about it. If you don’t know the answer, I’d appreciate your asking SpinVox on Tuesday.

    Thank you.

  • http://www.facebook.com/people/Tim_ODonoghue/722987326 Tim O'Donoghue

    Re “(2) What should I feed in to the machine to be converted?”, my suggestion would be to collect some real-world samples. I know there was a suggestion in an earlier comment to “self collect” some (perhaps contrived) examples, but why not get some samples by asking your network of contacts to send you some real-world examples from their Google Voice, HulloMail, YAC, etc accounts?

    Of course, people may not be happy sending you their voicemails for reasons of privacy so this approach to sample collection might not work ;-)

  • Gill Helfer

    Still think they’ll let you in?

  • Paul

    Ask them not to read the comments here so that they won’t be pre-warned of any devilish activity that you may undertake.

  • http://www.air-jordan-6.com/ air jordan 6

    It looks good,I have learn a recruit!
    Recently,I found an excellent online store, the XX are completely various, good quality and cheap price,it’s worth buying! http://www.grave-yards.com/

  • http://www.cheap-louboutins.com/ Manolo Blahnik

    http://www.buy-louboutinshoes.com/christian-louboutin-sandals-c-7 ‘s representative involving sales and marketing communications, Shawna Flower.

blog comments powered by Disqus