Siri: A Powerful Virtual Assistant For The iPhone

Editor’s note: The following guest post was written by Nova Spivack, CEO of of Radar Networks, the company behind Twine

A new paradigm for using the Internet is about to begin: Virtual Assistants (VA’s) are coming to a mobile device near you.

This week, a stealth startup will demonstrate the first public version of their mobile virtual assistant, Siri. This may mark the beginning of the era of consumer-grade virtual assistants on the Web.

Siri is focused on mobile devices – particularly the iPhone and other smart phones, it has an unusually productive interface and user experience, and it is super useful – it is something I would really use every day. As a result I would not be surprised if Siri becomes one of the top iPhone applications within a few months after their launch. (Disclosure: In the past, I worked on the DARPA-funded CALO project from which Siri sprung).

The team at Siri has given me a sneak-preview of their technology and product, and here I will dive deep to try to uncover the real significance and technical underpinnings of what they are doing. In addition, I’ll delve into the implications of the virtual assistant (VA) trend and what it might mean for us in the future.

This is a two-part article. In Part One, here, I cover the basics of Siri as a product, and the Virtual Assistant paradigm compared to search. In Part Two I will go more deeply into the technological foundations and questions.

First Look at Siri, the Product

Siri is a virtual assistant that is focused on helping consumers complete tasks in their online lives, particularly in the mobile context. The version I looked at runs on the iPhone.

Typical use cases are booking dinner reservations, buying movie tickets, getting local information, or finding things to do in your local area.

Siri is integrated with the APIs of a “couple of dozen familiar big brand” partners, according to Siri CTO, Tom Gruber, and part of their core technology involves being able to orchestrate and complete tasks across multiple services at once for users.

On the iPhone you simply launch Siri and then proceed to interact using a familiar “chat” interface. You make requests or state goals to Siri, and then Siri comes back with answers, questions or suggested actions.

You can type to Siri in natural language and it does a pretty solid job of parsing your requests. But more impressive is that you can simply push the talk button on your phone and speak to Siri and it understands you. No need to type. It works surprisingly well. I also like the visual interface – Siri illustrates the progress of each dialogue with nicely illustrated cartoon speech balloons. It’s super easy to follow the conversation.

It’s worth noting here that a speech interface to a Virtual Assistant on mobile phones could probably save lives if drivers used it instead of texting (recent research has found that 1 in 4 USA drivers admit to texting while they drive).

Siri is not the only company to offer a push-to-talk speech interface on the iPhone – Google provides one too. But what makes Siri unique is that this takes place within a dialogue, a conversation with your virtual assistant. Siri usually responds with an answer or a follow-up question to whatever you start with.

Mobile devices provide numerous challenges, as well as useful information, that make them the perfect venue for a virtual assistant. The main challenges of the mobile platform are screen size, input constraints, and bandwidth. Siri has addressed each of these issues.

Siri addresses the screen size issue by not forcing users to type lots of text or look at pages and pages of results. Instead, all interactions take place within compact and user-friendly chat balloons. This is a good fit for the small size of mobile screens.

Siri makes input easier as well. While you may type if you want to, Siri allows you to simply speak, using your voice, to give it questions or follow-up information.

As for bandwidth, Siri conserves it by reducing the need to surf. For example, if you want to book dinner, just tell Siri and it will do it for you, via OpenTable, without you having to surf through the entire OpenTable site to do so. This is a big timesaver.

Siri also takes advantage of information from your mobile device about your GPS location and time information about the your present context. These enable Siri to localize information.

For example, while you are out and about you can use Siri to discover a great place to eat, get tickets to a show, plan your weekend, or get help finding your way around town. Simply ask Siri, what movies are showing, and it will show you movies that are near your location and that you can still get to. Or ask it about restaurants and it will suggest restaurants you might like near your present location.

Beyond simply suggesting things to do, Siri can also do the legwork of making reservations, and orchestrating your plans, across multiple different services.

For example, suppose you want to book dinner and a movie – Siri can do it for you, as a single transaction, making sure that dinner is near enough to the movie that you can reasonably accomplish both without rushing or being late.

Siri knows about events in your area, things to do, what’s happening. It also knows about your personal context (your location, your current time of day), your preferences, and your personal information that you share with it.

By combining knowledge of your local situation and your personal profile, Siri is able to help you complete tasks – like finding a cool thing to do on a Sunday afternoon – in ways that are uniquely targeted to your particular interests and personality.

Of course, it will require a lot more testing to determine how well Siri really does this – and how personalized it can be. The current version is not very personalized from what I can tell, however according to Kittlaus, this is very much on their radar for future development.

According to Gruber:

“Siri deals with the hassle of accessing multiple sites to explore options, make choices, get reservations and buy tickets. It saves your favorites, keeps track of your bookings, and helps you remember what you liked about a place or event. It helps you invite your friends to things you arrange with Siri. It does things that a personal assistant would do for you using the Internet.”

“Siri is like having an assistant with an internet connection you can call for help when you’re out. For example, you might say ‘Hey, I’m at Market and Dolores. Is that modern art museum in the Mission area still open, and how do I get there? And isn’t there a cool Asian fusion restaurant around there? Can you book me a table at 8:00?’ You literally *say* what you want in your own words using voice. It’s not a voice recognition veneer on other products. It gets what you want to do on a whole new level.”

Limited by Design

Siri is limited by design — it’s not full artificial intelligence. But this is actually a strength, not a weakness. Instead of trying to solve the grand problems of general purpose artificial intelligence, Siri focuses on a few important vertical domains, for example: restaurants, movies, events, local business, weather, and services and data on the Web that relate to them.

These limitations mean that Siri doesn’t handle general knowledge or tasks outside its focus area. So what can’t it do?

Siri can’t do general question answering, like Wolfram Alpha or True Knowledge, although the team says that in some cases it can provide answers to questions it knows about, and this feature could be improved in the future.

For now, Siri’s knowledge is extremely limited and narrow to just the kinds of tasks it helps complete. It wasn’t designed to be a knowledge assistant. It won’t help you organize information and it won’t help you with your homework.

In addition, Siri doesn’t have much personality; it doesn’t try to be your chat friend or therapist (remember Eliza?) and its not particularly cute or anthropomorphic (Microsoft’s ill-fated Bob agent) either – these are all pluses in my opinion. Finally, in terms of task-completion, its core focus, it is still limited to fairly simple kinds of tasks (like making dinner and movie reservations for you). It can’t do complex planning and purchasing decisions, like planning an entire vacation.

Where is Siri headed? The Siri team has started by taking on very common, frequent use cases. But the technology is built to scale to new domains as well as to a larger user base. So I would not be surprised to see lots of other things that assistants do coming on line as Siri matures (The CALO project that Siri comes from, was deep in “office assistant” use cases such as scheduling, travel planning, meeting assistance, organizing and learning).

It’s important to keep in mind that Siri is just getting started, so don’t expect it to do your laundry or manage your finances. It won’t pass the Turing test either and it’s not the beginning of Skynet (not unless Siri somehow mates with Wolfram Alpha…). It’s a simple, useful tool, with an impressive amount of intelligence behind it. I am looking forward to the public release. Siri shows promise making smart phones far more productive and useful for consumers.

Not a Google Killer – Task Completion vs. Search

Before I go too far, I want to state in no uncertain terms that Siri is not a “Google killer.” Siri is not trying to solve the general Web search problem (neither is Wolfram Alpha, for that matter), it’s trying to do something quite different. Siri is focused on completing tasks for you, not finding Web pages.

Siri is shifting the interaction paradigm for the Web from search to assistance. While both search and assistance depend on understanding user intent, the “assistant paradigm” derives user intent through conversation with the user, instead of just a single set of keywords.

Furthermore, while the goal of search is simply to provide a set of relevant Web pages, the goal of assistance is task completion – actually doing something for the user, like for example, booking dinner reservations or buying a ticket to a concert.

As Tom Gruber, CTO of Siri, explained to me:

The current interaction paradigm for the Internet is the search engine. The contract of the search engine with the user is this: you state your intent as search keywords, and it returns links to matching information sources. The measure of quality for a search engine is relevance: i.e., how well links that it returns provide the information needed by the user.
The assistant paradigm changes the contract. You express intent in a conversation, as a request or goal statement (“I need an X” or “I want to do Y”). The assistant asks you for clarifying information if needed, and guides you through the process of exploring options and making a choice. The measure of quality for an assistant is task completion: i.e., how well it helps you solve the problem that you expressed in the conversation.

Both interaction paradigms are important, but they serve different purposes.

When the task is to find information and the problem can be solved by reaching a web page, the search engine paradigm is optimal.

When the task is to solve a problem involving personal context, preference, or choice, and when applying multiple information sources to a task, the assistant paradigm is better suited.

Siri is not competing with Google. Siri is focused on task completion, not search.

However, because task completion is often focused around commercial activities – buying or selling things — it’s potentially as or more monetizable than search. This is because task completion uncovers consumer intent more explicitly than search.

In search, user-intent has to be guessed at from keywords and clicks. But in task completion, user-intent is directly explained by the user – because the user can state their goals directly, and refine their intent in a conversation. Search queries can be about anything, but assistants know what kind of information is relevant to a task. This is why conversation works for tasks: the user can state a goal, perhaps vaguely; the assistant can offer refinements; the user can choose among them; and the conversation quickly converges on a solution. “Conversion” in this context is not coercion, it is cooperation between the human user and the software assistant that is trying to help them.

Consumers use search engines even when they are not in a buying or doing mood, but with a service like Siri a higher percentage of user interactions have true commercial intent. By helping a consumer make a purchase, presumably Siri and other task completion assistants will share in the downstream revenues they help generate.

In conclusion, while Siri does not compete with Google in Google’s core market (search and advertising), one can easily imagine task completion, and the ensuing commerce transactions it generates, becoming a huge opportunity – one that I would expect Google, Microsoft, Yahoo and others to want to compete in eventually.

Virtual Assistants: The Paradigm

The idea of Virtual Assistants (VA’s) was not invented by the Siri team. It’s an idea that has been around for decades. It has roots in Apple’s famous “Knowledge Navigator” video, and in the original intelligent agents and DARPA work that inspired the invention of the Semantic Web. Even luminaries such as Google’s own Peter Norvig have worked on agents in the past. What’s more, it’s been an elusive vision, evading the best attempts of several startups of the past, such as the once-super-hot General Magic with their Magic Cap operating system.

While General Magic had the right idea, they were too far ahead of their time. Ultimately the Web proved simpler to implement and adopt. Perhaps today the time is right, or at least better, for this to happen at last. In particular several trends make it much easier to build an intelligent agent product today, including:

The increasing amount of structured data on the Web (in XML, and even RDF)
The wide availability of open API’s for services that provide information and commercial transactions.
The global adoption of the Web and the simultaneously increasing need for smarter tools to help cope with it
The growing adoption of 3G smartphones and other mobile computing devices

There are many different kinds of intelligent software agents that have been studied and tested over the years. Some agents are designed purely to interact with other agents, and with other software. But others are designed to interact with humans to help them get things done. These “virtual assistants” are what Siri is focused on.

Instead of a Web where consumers have to shoulder the burden of manually searching for things themselves, we are moving to a Web where intelligent agents will assist consumers to meet their goals through a conversational dialogue.

The key to the virtual assistant paradigm is conversation. When we interact with a virtual assistant it will not be like using a search box on a search engine. Search boxes are not conversational. You type some keywords, and you get some results. The end. With Virtual assistants the user-interaction is framed as a dialogue with the assistant, in natural language, not keywordese.

VA’s are like real-world assistants – they are two-way interactive; they may offer suggestions, they may ask questions. And this is very important, because it prevents the risk of our virtual assistants going off and doing things we don’t want them to do. In short they only do what they are asked to do, and before they do it, they double-check by asking for permission. So there is no risk, for example, of VA’s going off and trading stocks on your behalf, or buying things for you, without your permission.

Being first to a new market opportunity is not always best, as history has shown. However, I believe Siri has done a better job than most at creating a powerful technology base and a compelling user experience, well enough before the competition to have a real shot at leading this category.

Part Two: How Siri Works – The Technical Stuff

In the second part of this preview of Siri, I will provide my exclusive in-depth interview with Siri’s CTO, Tom Gruber, about the underlying technology behind Siri: Click here to read Part-Two.