Editor’s Note: Norman Winarsky is the Vice President of Ventures and Bill Mark is the Vice President of the Information Computing Sciences Division at renowned research and technology development organization, SRI International. Norman and Bill helped found the Siri venture, of which Norman was also a Board member.
Since its launch in the iPhone 4S, Siri has become a phenomenon, and for good reason. Siri is a revolutionary consumer software product based on breakthroughs in speech and artificial intelligence technology.
Siri has appeared extensively in the media as a new consumer phenomenon, including Dilbert and Jon Stewart. In November, Eric Schmidt testified to the U.S. Senate Judiciary Committee that Siri was potentially a major threat to Google. Siri has even been the major part of an episode of the sitcom “Big Bang Theory” on CBS and the subject of numerous parody Tumblr and Twitter accounts.
Without a doubt, Siri was a great achievement for Apple and Steve Jobs, helping to introduce virtual personal assistants to millions of consumers, and changing forever the way we view our smartphones. The team also brilliantly designed Siri to go beyond being a mere tool, giving it a personality, and human-like interaction characteristics.
Do you like me, Siri? Where can I bury a body, Siri?
There is no doubt in our minds that Apple will continue to advance the Siri, technology, and will create new breakthroughs in the virtual personal assistant (VPA) category overall. For example, it’s clear that Apple is capable of making a Siri API for application developers in the near term, enabling hundreds of thousands of applications to access their own assistant. Soon it will become de rigueur for all applications to offer spoken interaction and meaningful delegation. In fact, we consumers will be surprised and disappointed if or when they don’t.
Beyond the laudatory comments and requisite speculation, and because of our central role in creating Siri, we at SRI are often asked – what’s next?
As we always respond — Siri is just the first step in realizing the ultimate virtual personal assistant vision. This post first outlines what we think Siri’s legacy will be, and then gives the broad strokes of what will mark the next phase(s) of VPA innovation.
To start, Siri’s greatest effect will be the entirely new industry that it is creating before our eyes. At SRI, we see VPA technology as an essential element of future products in areas ranging from smart TVs, to health care assistance, to virtual tutors in education, and more. VPA is not just a fad, or a trend. It is in many ways the destiny of computing and a decades-long project, or more. As we speak, SRI is spinning out three new startups that are underwritten by the VPA paradigm and our related R&D. They are already VC-funded and preparing their first products for wide use. We think we’ve only seen the tip of the iceberg.
Technologically speaking, Siri’s true impact is seen in the new bar it set for what we call “practical natural language understanding.” Using speech instead of keyboards to communicate with computers is an old dream, but it took more than thirty years to achieve the robustness and performance needed to make speech systems practical for consumers.
Developing software for limited-vocabulary and spoken language recognition was the first step, and we are all familiar with call center applications that marked the first efforts in this arena. However, developing software to enable computers to respond reliably to a broad range of spoken input is much more challenging. Siri required not just speech recognition, but also understanding of natural language, context, and ultimately, reasoning (itself the domain of most artificial intelligence research today.)
Post-Siri, new speech-enhanced artificial intelligence research continues to be subject of enormous investment at SRI and elsewhere, most notably by the U.S. Department of Defense, which is anxious to increase the performance of personnel dealing with complex systems across a wide array of use cases.
So with those forthcoming advancements in mind (which SRI cannot discuss in detail, unfortunately), what’s indeed next for VPA technology? What glimpses of the future can we nonetheless share?
We can tell you this — the next-generation VPA will enable you to have a much deeper relationship with your assistant. Siri has a conversational interface today, but these intercations seldom last more than one or two utterances. Tomorrow’s VPA conversations will be about more complex tasks with multiple steps and more nuance (exploring healthcare alternatives, planning a vacation, and buying clothes, to name a few scenarios.)
This next wave of VPAs will be also able to maintain the context of the conversation for long periods of time, reason with clarity about what you discuss, provide answers to your questions, execute tasks for you, and all along the way learn from you and noticeably improve with use. The experience will be more will be personalized that what you experience with Siri today, and it will have greater depth. VPA’s will also be more proactive, constantly discovering things that you might care about and even starting conversations with you about what they find.
Let’s illustrate these new VPA capabilities with a conversation between a real person named Lisa, and a virtual personal shopping assistant named Nina, with Lisa wanting to buy a purse:
Lisa: “Nina, I need a new purse.”
Nina: “Great! Do you want to buy something from Michael Kors like you did last time?”
Lisa: “Well, I’d like Michael Kors, but I don’t want to spend more than $400.”
Nina: “Last time you bought your Michael Kors purse from Nordstrom. Nordstrom has a Michael Kors sale right now…here are some purses you might like.”
Lisa: “I like the chocolate brown one, at $329. Is that the best price you found?”
Nina: “I saw a couple of offers at $310 from other retailers, but their return policy isn’t as generous as Nordstrom’s.”
Lisa” “Okay, let’s go with Nordstrom”.
The important part about this conversation is that it is natural, real, and helpful. Lisa is getting what she wants from Nina, an assistant that is familiar with her purchase history and the stores that she prefers.
Lisa expects Nina to know all about shopping, and to use that knowledge to help select just the right item, at a good price. Her virtual personal assistant will also learn from this conversation, and maintain the history and context for follow-on conversations, as well as future purchases.
That last point bears emphasis. As the VPA learns, it will become more and more valuable. This kind of capability is often invoked, but seldom delivered. “Learning in the wild” is another old dream that is only now starting to come true. The truly adaptive VPA is the ultimate “sticky” application.
And of course, Lisa likewise trusts the VPA more and more as Nina demonstrates increasing competence. That trust applies not only to personalized, accurate information delivery, but also the protection of personal information. For any VPA, trust – especially vis-a-vis security and privacy — will be a central requirement, and the next generation of VPA offerings will be graded on a steep curve.
It all sounds pretty good, right?
Luckily, a VPA that can interact with me with such great depth and nuance is not merely a science fiction. SRI is building these capabilities today. We and our research partners are dedicating immense time and resources to making this future come true.
VPA is about augmenting humans in ways old and new – so that we can be our best selves, and accomplish more with less. At SRI, the theme of human augmentation goes back to the days of Doug Engelbart, inventor of the mouse, and pioneer of human-computer interaction. Doug’s quote from his 1962 article captures it well:
By ‘augmenting human intellect’ we mean increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems.
We believe that VPA represents a new and especially productive vein of progress. VPA is the most elegant and effective way we have figured out yet for humans and machines to interact. We think that it will revolutionize the way we think about machines, just as Engelbart’s vision did with the “mother of all demos” nearly 50 years ago.