Fable Studio founder Edward Saatchi on designing virtual beings

'The end goal for us is an operating system, or OS'

In films, TV shows and books — and even in video games where characters are designed to respond to user behavior — we don’t perceive characters as beings with whom we can establish two-way relationships. But that’s poised to change, at least in some use cases.

Interactive characters — fictional, virtual personas capable of personalized interactions — are defining new territory in entertainment. In my guide to the concept of “virtual beings,” I outlined two categories of these characters:

  • virtual influencers: fictional characters with real-world social media accounts who build and engage with a mass following of fans.
  • virtual companions: AIs oriented toward one-to-one relationships, much like the tech depicted in the films “Her” and “Ex Machina.” They are personalized enough to engage us in entertaining discussions and respond to our behavior (in the physical world or within games) like a human would.

Part 3 of 3: designing virtual companions

In this discussion, Fable CEO Edward Saatchi addresses the technical and artistic dynamics of virtual companions: AIs created to establish one-to-one relationships with consumers. After mobile, Saatchi says he believes such virtual beings will act as the next paradigm for human-computer interaction.

Edward Saatchi: My company, Fable Studio, started six years ago as Oculus Story Studio. And since then, we pivoted about two years ago to virtual beings: interactive, AI-powered characters. The end goal for us is an operating system, or OS.

So it’s not purely entertainment or content. And it’s not a pure OS, either. You wouldn’t say “the OS is nervous to go to school tomorrow,” but you may say that once your OS becomes a virtual being. The line between media and operating system will fall away, just as you see in movies like “Her.” It takes a lot of artistry to actually create that character.

TechCrunch: Let’s talk about the components required to make that happen. On the natural language processing side, there’s obviously speech-to-text and processing and coming up with a response. But then, the media element here which is creating a voice, for this virtual companion that sounds human or at least sounds consistently in character. Whatever it’s been designed for, and perhaps a visual element of the other character having facial expressions that convey the correct emotion to humans.

How do you break down the technology components of making this happen?

Let’s start first with hardware and then software.

Hardware doesn’t need anything new. Yes, maybe edge computing around AI, but this is not an AR/VR concept. About six years ago, a lot of brilliant people left their work on smartphones, TVs, laptops and tablets for AR and VR ventures because they believed that would be the next major device category. That has failed. We are all coming back and looking at what our friends have done in the smartphone, tablets, TV and laptop world in the meantime, and it’s almost nothing. Yes, streaming, but that’s delivery of an old type of content. There has not been wild innovation. So I think you’ll see this “virtual beings as OS” comes into focus as people in Silicon Valley decide to stop focusing on devices that nobody uses and instead focus on revolutionizing the use of devices we already have.

On the software side, you’ve heard of AGI — artificial general intelligence — which is very far off. It’s the singularity with a perfect AI that surpasses humans. We think of ACI — artificial coherent intelligence — which is putting a stop to the trend of silo-ing in AI. Since 2012, we’ve had an incredible amount of siloing in AI like in synthetic speech, procedural animation, object recognition, image recognition, navigation, GANs for creativity. What we’re starting to see is entrepreneurs stitching these technologies together into one application. There’s probably 15 capabilities that we’re excited about and that already work quite well. In fact, in many cases are a lot better than NLP. By stitching these things together, we will hide some of the problems in NLP and encourage people to keep engaging with a virtual being because it does more than just talk.

What’s the timeline in your mind to a “minimum viable virtual being?”

You already have it to some extent, but within two years. When you look at what AI Foundation showed off at One Young World with Richard Branson having an ongoing conversation, able to see things and recognize images — we’re already there at a basic level. It’s 20% good. Maybe he’s a little dead-eyed. But now, we’re going from 20% to 30%, then from 30% to 40%, and then we’re beyond the point of the uncanny valley where we just completely reject it. We will start to embrace it and feel good with the character in five or so years. 

How important do you think is it for a virtual companion to be visually or even physically represented? The different representations of this concept in film are interesting: the virtual companion in “Her” is audio-only, whereas in “Ex Machina” it is a physical robot with facial expressions.

So definitely not actuated intelligence, which is robots. Robots introduce so many problems that are in no way necessary for you to believe that the character is real, like navigation and plausible walking. These are not relevant problems to the success of virtual being. 

I don’t think a visual element is required at all. “Her” shows that. I believe lots of people who I’ve only conversed with over the phone or over Facebook messages are real.

What will actually work? I think that needs a visual element. If you are going the path of only text and voice, you’re drawing attention to how bad the NLP is. The visual is helpful because it draws attention away from the NLP. With the AI Foundation example, you’re impressed that the camera can see that you’re holding up a photo of a dog and recognizes you with facial ID, so you don’t attack it so aggressively. It gives you things to do other than talk. Whilst visual isn’t necessary for an emotional connection, it is probably necessary to have a viable virtual being in the next few years.

For sure — so much of our communication, whether it’s with humans or even with animals, is through body language. 

A lot of discussion about virtual companions assumes the companion needs to trick us into thinking it’s a human. But it’s already clear from our interactions with animals and our emotional attachments to Pixar characters that we can develop an emotional connection without the character looking or sounding human.

I’ve certainly gone on a journey around this question. Cartoony is good for a long time. But should it be human-like, or post-human, post-gender, some transcendent being? Should it be a robot or alien or animal? A few months ago, my stance was that it should be human because I don’t want to hang out with a robot. You get a lot by making it a human. 

More recently, we’ve instead focused on a different issue: a family or tribe. Pixar movies aren’t just one character. The character is in the context of family and friends. That is what provides conflict and drama. The toys from “Toy Story” are recognizable to us because of the drama between them. So I’ve moved away from the idea of a single virtual being like Lucy towards a family or tribe. Whether it’s a tribe of humans or toys or dogs isn’t as important.

I may identify with different members of the virtual family based on my mood or as I age. Multiple characters allowed for diversity of class, race, gender and age.

It’s a fascinating insight into what creates our emotional connection to characters in different stories. It’s not just their solo journey but how their interactions with others shape our understanding of who they are, what they’re going through, and why we should care about them.

Isn’t it also vastly more complicated to create a whole family or tribe with all the narratives in between them?

Not necessarily. It’s hard to create a story around one character alone; it’s easy to create a story for a character that is in a social context. What the hell is Lisa Simpson in isolation?

Brud is great, but its secondary characters at Lil Miquela have not been as successful, and I wonder if they created Miquela first then added the others as opposed to crafting them together from the start to be like the Kardashians.

Another important point here is that when we watch “The Simpsons,” we are entertained by it. Media. The entertaining social interactions between virtual beings, draws attention away from the reality that AI is not perfect yet. If we can watch virtual beings as much as we interact with them, it takes the pressure off the virtual being to be always animated and perfectly intelligent.

So, virtual companions will fit into our lives by being entertainment as much as utility. They’re not just more emotional Amazon Alexas responding to commands. 

How do you view the role of such beings among the landscape of entertainment experiences that people currently engage in?

You have the Miquela model where the character is being passively observed across Instagram, YouTube, Facebook. 

Then there are applications: learn a language or play a game or meditate with a virtual being. The line between utility and entertainment is fuzzy. Virtual beings as purely interactive entertainment is doomed though because we’re lazy — we don’t always want to interact.

This is where a virtual companion turns into an operating system, right? It’s starting in one area where maybe it’s more about entertainment or a specific skill set and then it expands to take on other tasks.

Yes, absolutely. You also won’t order your virtual being around in the manner you give a command to Amazon Alexa: you ask her to turn off the lights like you would ask a friend or family member to do so.

I used to think we should restrict the virtual being from doing servant tasks. You can’t ask them to do tasks for you as your servant. But I realized that we all ask each other to do tasks. Like “Might I turn off the lights?” or “Can you turn it up?” or “Can you set a timer?” or “Can you get the potatoes?” So it’s just that the context will be a little bit different.

Alexa is actually an unusual context of asking someone to do something with no give and take. I think a virtual being will do stuff for you but they’ll probably also ask you to do some stuff for them. If you have them read bedtime stories to your kids, maybe sometimes they ask you to include their kids when you read a bedtime story.

Operating systems for electronics today are all oligopolies — there are very few operating systems in the entire world, like Android and iOS dominate on mobile devices. The movie “Her” conveyed the idea of one AI having personalized interactions with everyone. Is that the future you see or is the future of “virtual beings as operating systems” much more fragmented?

I instinctively think it will be a small number of universes, but again, I don’t think it should be one character. So the Alexa universe would have different characters within it and you turn to specific ones depending on your mood or the skill set you’re seeking.

What chance do startups have to create these virtual being operating systems in light of the resources of incumbent tech giants like Apple, Google, Microsoft and Facebook?

If the next operating system is a virtual being, then a technology company can’t build it, because it requires artistry. And an artistic company can’t build it because it requires technology. So you need a hybrid team. The big corporates would be in a tough spot. Startups may be better at that. 

One other thought is that people may be quite open to a virtual companion OS that only has a few apps. Too much choice can often be a negative. Alexa has like 45,000 apps, but people only use a few. It’s hard to believe that they wouldn’t be in a better spot if they had focused on three new apps a year that are cool and really work, just as Apple did that before building the App Store.

Compared to an iOS update with relevant capabilities, I think people would be more excited by a new virtual being OS that only has a few apps but those apps are very good and designed for this paradigm.