Editor’s note: Tareq Ismail is the UX lead at Maluuba, a personal assistant app for Android and Windows Phone that was a Battlefield participant at TechCrunch Disrupt SF 2012. Follow him on Twitter @tareqismail.
The release of Facebook’s Graph Search has raised much discussion among technology pundits and investors. One of the biggest questions surrounding the highly anticipated feature is its availability on mobile.
After all, Facebook CEO Mark Zuckerberg has said on a number of occasions that Facebook is a mobile company. “On mobile we are going to make a lot more money than on desktop,” he said at TechCruch Disrupt SF 2012, adding “a lot more people have phones than computers, and mobile users are more likely to be daily active users.” Facebook understands mobile’s importance, so why wouldn’t it offer Graph Search for Android and iPhone from the start?
It’s simple: Graph Search for mobile would need to incorporate speech, which is a different beast altogether.
Many of the examples given during the Graph Search keynote contained long sentences, which are not easy to type on a mobile device. Think of the example “My college friends who like roller blading that live in Palo Alto.” Search engines like Google get around this on mobile by offering autofill suggestions, but their suggestions come from billions of queries. For Facebook, since their search is based on hundreds of individual values like “fencing” or “college friends” specific to each user and not a group, autofill suggestions will often not be useful, or worse, will require a lot of tapping and swiping to drill down to the full request.
What’s more is that Graph Search queries are designed to be written out naturally in full-form sentences with verbs, pronouns, etc., which is 0something that keyword search engines like Google do not need. If you’re looking for sushi places to eat on Google, it’s a five-character search for the keyword “sushi.” With Graph Search, Facebook wants to show you sushi results refined by a group of your friends, so the same search would require writing out “sushi restaurants my friends have been to” or “sushi restaurants my friends like.” That’s a lot more typing.
It’s clear that on mobile, Graph Search would need to be powered by speech to make it most effective. No one will want to type out such long sentences. Not to mention, with services like Google Now and Siri, people will come to expect control through speech.
Supporting speech is a different problem altogether than what they’ve solved so far and they’ll have to do a lot more work until it’s available on any major mobile platform. Here are four reasons why.
Speech Recognition Doesn’t Come Cheap
If time is money, then speech recognition is very expensive. It’s well-known that it requires a considerable amount of investment to develop and no one knows this better than Apple and Google.
Apple chose to not make their own speech recognition but rather license Nuance’s technology for Siri. Nuance has spent over 20 years perfecting their speech recognition; it’s not an easy task and they’ve had to acquire a number of companies along the way.
Google, on the other hand, chose to develop their own speech recognition and needed to build a clever system to collect data to catch up to Nuance. The system, called Google 411, set up a phone number where people could call in from landlines and feature phones to ask for local results. Once they got the data they needed, they shut down the service and used it to build their recognition system. It’s taken a company like Google, who masters search, over three years to come to where they are now with their speech recognition.
Even if it takes Facebook half as long to come up with a similarly clever solution, they’ll need to start soon for it to be released any time in the next year.
Names Are Facebook’s Strength And Speech Recognition’s Weakness
One of Facebook’s early successes has been names. The company’s algorithms to return the most relevant person when making a search for a friend played a key role in its early success. People are accustomed to saying “add me on Facebook” without the need to specify a username or handle, an advantage that makes their entry into speech that much harder.
Names are speech recognition’s biggest challenge. Speech recognition relies on having a dictionary or list of expected words that are paired to sample voice data given to the system. That’s why most engines do really well when recognizing common English words but have such a hard time with out-of-the-norm names and varying pronunciations. Facebook has hundreds of thousands of names to deal with and it’s a key part of their experience, so they’ll need to master the domain for it to be useful for their users. Now, one could argue that having access to all these names may give Facebook the edge to solving this problem, but they’ll need to work on a solution for some time for it to become anywhere near acceptable.
Supporting Natural Language Isn’t Easy
The final piece of the puzzle may be the most difficult: supporting natural language is really, really hard. Working at natural language processing company Maluuba, I can attest to just how hard a problem this is to solve. Natural language processing is the ability to understand and extract meaning from naturally formed sentences.
This also includes pairing sentences that have the same meaning but are said differently. For example, with Graph search, I can type “friends that like sushi” and it shows a list of my friends who have identified sushi as an interest, but if I type “friends that like eating sushi” it looks for the interest “eating sushi” — which none of my friends have listed — and it returns zero results. In reality, both sentences mean the same thing but are worded differently. Understanding natural language involves understanding the real intent behind a request, not just its literal intent.
On a desktop browser, they may be able to get users to learn how to search in specific sentence templates, especially with the help of autofill suggestions. But for speech it’s nearly impossible. People ask for things differently almost every time; even the same person can ask for the same request in a different fashion when speaking. Ask 10 of your friends how they would search for nearby sushi restaurants. I have no doubt most, if not all, responses will be different from one another.
Now, they could fix the sushi example I gave earlier but that may cause false positives with other aspects of the system. Understanding natural language requires large data sets and complex machine learning to get right, something that Facebook’s Graph Search team may be investigating but will not be able to master any time soon. It’s just not a simple problem to solve. That’s why Apple jumped into a bidding war to buy Siri, which at its core is a natural language processor. To put into perspective how difficult it was, Siri spun out of the DARPA project that took over five years to build with over 300 top researchers from the best universities in the country.
Languages, Languages, Languages
Facebook has over a billion users who collectively speak hundreds of different languages. Facebook has said they’re beginning their launch with English. How long until all billion users’ languages are supported for the desktop? And since speech is significantly harder, how long until those users are supported on mobile? It’s one thing to support hundreds of languages through text and a much harder thing to support it through speech. This will be the problem they face for the next decade.
Facebook acknowledges that their future lies in mobile. Mobile begs for Graph Search to be powered by speech, something that Facebook simply cannot do yet. I have no doubt they will but it most definitely won’t be to any acceptable quality anytime soon. They’ve taken the first step but they have a long journey ahead of them.