An interview with Dr. Stuart Russell, author of ‘Human Compatible, Artificial Intelligence and the Problem of Control’

(UC Berkeley’s Dr. Stuart Russell’s new book, “Human Compatible: Artificial Intelligence and the Problem of Control, goes on sale Oct. 8. I’ve written a review, Human Compatible” is a provocative prescription to re-think AI before it’s too late,” and the following in an interview I conducted with Dr. Russell in his UC Berkeley office on September 3, 2019.)

Ned Desmond: Why did you write Human Compatible?

Dr. Russell: I’ve been thinking about this problem – what if we succeed with AI? – on and off since the early 90s. The more I thought about it, the more I saw that the path we were on doesn’t end well.

(AI Researchers) had mostly just doing toy stuff in the lab, or games, none of which represented any threat to anyone. It’s a little like a physicist playing tiny bits of uranium. Nothing happens, right? So we’ll just make more of it, and everything will be fine. But it just doesn’t work that way.  When you start crossing over to systems that are more intelligent, operating on a global scale, and having real-world impact, like trading algorithms, for example, or social media content selection, then all of a sudden, you are having a big impact on real-world, and it’s hard to control. It’s hard to undo. And that’s just going to get worse and worse and worse.

Stuart Russell HUMAN COMPATIBLE Credit Peg Skorpinski

Dean’s Society – October 23, 2006; Stuart Russell

Desmond: Who should read Human Compatible?

Dr. Russell: I think everyone, because everyone is going to be affected by this.  As progress occurs towards human level (AI), each big step is going to magnify the impact by another factor of 10, or another factor of 100. Everyone’s life is going to be radically affected by this. People need to understand it. More specifically, it would be policymakers, the people who run the large companies like Google and Amazon, and people in AI, related disciplines, like control theory, cognitive science and so on.

My basic view was so much of this debate is going on without any understanding of what AI is.  It’s just this magic potion that will make things intelligent. And in these debates, people don’t understand the building blocks, how it fits together, how it works, how you make an intelligent system. So chapter two (of Human Compatible was) sort of mammoth and some people said, “Oh, this is too much to get through and others said, “No, you absolutely have to keep it.”  So I compromised and put the pedagogical stuff in the appendices.

Desmond: Why did computer scientists tend to overlook the issue of uncertainty in the objective function for AI systems?

Dr. Russell: Funnily enough, in AI, we took uncertainty (in the decision-making function) to heart starting in the 80s. Before that, most AI people said let’s just work on cases where we have definite knowledge, and we can come up with guaranteed plans.

But by the 80s, it was obvious that this was inadequate. People started grudgingly to learn probability theory and learn about sequential decision making under uncertainty from operations research.

We just never noticed that there’s also going to be uncertainty in the objective itself. I can’t really explain why that is.

I mean, everyone was experiencing errors in the objective function. I mean, there are documented cases of people putting in some objective and some weird behavior coming out, and they say, “0h, we got the wrong objective.”

Desmond: Where does the AI community come down on the question of AI’s dangers to humanity?

Dr. Russell: I think we are making progress among people who have been around awhile. The field right now is an interesting mix. There are people like me that have been around through the logic period of AI, the probability period of AI, and now it’s the deep learning period of AI.

But the vast majority of people who are going to conferences and working for Google and so on, have only been around in the deep learning period. And they don’t really know much about uncertainty. They don’t know much about objectives. All they know is data and train the algorithm. It’s a little harder to reach that group. And what is the default response of any AI researcher to someone telling them that what they’re doing is going to destroy the world? What do you think? They come up with one excuse after another about why they don’t have to pay any attention.

I started giving talks on this 2013, I would gradually add objections as I heard them (to my talk). But it just got to be too big a portion of the talk. I was up to 25 or 26.

But I’ve never found a situation where – if I sit down with someone for half an hour –  I can’t persuade them. What I hope is that most of those people will read the book.

Desmond: Can you speak on Elon Musk’s warnings about AI?

Dr. Russell: I think Musk is saying don’t build AI on the current lines. “Summoning the demon” is a colorful way to describe it and I’ve used giving “three wishes” to the genie.  Your last wish is always please undo the first two wishes. So I think he’s just reiterating that point. And I think it’s a valid point.

Looking back, nuclear energy, nuclear weapons would be the closest analog. And how well do we do with that? I think you’d have to say that we got tails 12 times in a row, and if we hadn’t, we would all be dead. What should have happened with nuclear energy is that people listened to those who in 1915 were saying, look, (we) know how much energy is in there. They could do the math. They knew, for example, that radium could emit energy continuously for years and years and years.

Even then you could have started planning for the possibility that nuclear energy would be developed and therefore, it would be weaponized. So given that possibility, the scientific community could have agreed that the technology would be placed under the control of some international body, and there would be a ban on any kind of weaponization. Once it happened in 1933, there was already an arms race. There wasn’t much chance of a benign outcome. So you’ve got to do it before the incentives become too strong.

Desmond: Are we already past that point?

Dr. Russell: No, not yet. It’s interesting to see that, for example, that the Chinese government is stating that AI could be a major risk to humanity. So they understand that point. Does the US government understand that point? I am guessing probably not right now.

(UK Prime David) Minister Cameron’s office asked me to go visit them when he was in office, and they were concerned that all of this doom talk would actually crimp the burgeoning AI industry, or lead to GMO-like bans. So they saw GMO as a failure in the sense that they ended up with regulations that are much stronger than their scientific advisors believed were necessary. And they didn’t want the same thing to happen to AI.

So I think we’re going to probably have to have some more catastrophes. And maybe what’s happened, and what’s still happening, with social media will come to be seen as such a catastrophe.

Corporations are intelligent agents that are pursuing this objective of profit. They are extremely smart. They’ve out with outwitted the human race on fossil fuels and they are outwitting us on social media.

The idea that Mark Zuckerberg is going to get up and say, “Yes, we messed up, We’re responsible for this. Here’s what we did. Here is what we did wrong.” That would wipe $500 billion off the market cap; he would be fired by the board and they would put someone in more pliant.

Jacket HUMAN COMPATIBLE 1

Image via Penguin

Desmond: Can you speak on the “catastrophe” of social media.

Dr. Russell: So let’s assume that what those algorithms are doing is trying to maximize click-through rate:  the probability that when I send you an ad, you click on it, and then I make some money. So what I think is going on is that those algorithms are actually reinforcement learning algorithms, meaning that they view the click from the user as a reward. And what they’re trying to do is to get more of that reward. And the way reinforcement learning algorithms work is that they will take any action sequence that they can to get the maximum possible reward. And what they seem to have figured out is that by feeding people content that gradually modifies the person’s preferences – for example,  news content – then feed them stories that are always just a little bit more right-wing or left-wing than their current position. You keep gradually moving their current position and you change their beliefs. Even if you start out in the middle, they can just move you along the spectrum, to where you are more extreme and (therefore) more predictable. So there’s an assumption here that people who are more extreme are more predictable in what they’ll click on.

If instead what they simply tried to learn was what the user wants, treating the user as having fixed preferences, so not a reinforcement learning algorithm, but what we call a supervised learning algorithm. So each time the user clicks it’s just an affirmation that the user is interested. And if he doesn’t click that’s evidence that he is not interested. That’s a different type of system, right? We simply learn to send you stuff that matches your current interests. Now, you might want to add some random spread to that. Because you also don’t want to narrow the person’s interest by always sending stuff that sort of exactly in the middle of what they care about. You could build systems that would have minimal effect on people’s preferences.

And that would make more sense. And so this is actually where I got this idea for the kind of FDA. So we had an article with a couple of my colleagues in Wired last week, basically saying —

it was a little bit tongue in cheek — we wouldn’t say the FDA should be regulating — we don’t want 10-year clinical trials. But if you’re going to have systems that can have these dramatic effects on hundreds of millions of people, with nothing between the dudes chugging Red Bull in the lab and the billions of people who are on the receiving end, that that’s just a recipe for disaster.

Desmond: How can we head off more AI catastrophes?

Dr. Russell: Eventually, you have to have a regulation, saying that to be sold, to be certified to be connected to the internet, (AI)  has to pass these tests.  It has to conform to one of the accepted standard templates, or it has to come with a proof that it doesn’t get out of control, that it remains beneficial.

You wouldn’t want to say, this is the only solution to the problem. You’d want to allow people to present new solutions, as long as (the solution) comes with proof that it is safe. It’s a little bit like the app store, where you don’t get to put your app in the App Store unless it passes a whole bunch of safety tests.

Desmond: What would be the principles underlying AI regulation?

Dr. Russell: The approach that I’m proposing is one possible way to do it. You have systems that don’t know what the objective is, and the true objective is kind of a latent variable, a  hidden variable. It’s in the person but the machine doesn’t know what it is, and is trying to find out more so that it can help the human realize the objective.

It becomes much easier to do that if you can bound the scope or effect of the system. The reason we don’t worry about AlphaGo taking over the world, besides the fact that it’s actually pretty stupid, is that the only thing it can do is put virtual pieces on a virtual Go board inside and inside a computer.

Now, if it were a bit smarter, Alpha Go might be able to use patterns of pieces on board to communicate, and to convince people to do things. But absent that you would the scope of effective of AlphaGo is limited to the Go board. So a lot of systems are like that. But a lot of systems aren’t like that. Anything that can connect to the internet, anything that can communicate with people through screens, actually has almost unlimited scope of action. There is literally nothing It can’t do because it can convince people to do things on its behalf, as well as possibly infiltrating robots and so on.

Desmond: Could you hardcode constraints into the objective function of AI, like Asimov’s rules?

Dr. Russell: Asimov’s rules always had their own set of problems, partly to do with what is the meaning of “harm” and partly to do with the fact that almost anything a machine does has some possibility of “harm”, like driving you to the airport.

You could try to hard code these constraints but there are a number of problems. One is that the constraints you would like to hard code are at the level of things happening in the world, like, you can’t drive through a red light.  What you can hard code inside the machine only has to do with what goes on inside the machine. You cannot constrain things happening in the world by putting a constraint inside the machine, right? I mean, the way it works for us is someone tells us you can’t drive through a red light. And then we have the central apparatus that allows us to judge what is a red light. It’s not a little kid that is waving around a red flashlight in the middle of the road.

Let’s suppose you can get around that and write rules in such a way that the system can correctly identify all the red lights that are actually the ones you’re supposed to stop at. The problem you have is that you get this the “loophole principle”, I call it in the book,  that any constraint that you try to impose on your behavior, (the AI) has an incentive not to follow that constraint. It will find some way that respects as if it were the letter of the law, but not the spirit. I give the example of tax law, you know, which we’ve been writing for 6000 years, trying to constrain the behavior of economic agents, but they have an incentive not to pay taxes, and so they find ways around it. They find loopholes.

Image via Getty Images / boonchai wedmakawand

Desmond: Do we need legislation to control AI?

Dr. Russell: The reason I’m not making recommendations for legislation is because we’re still doing research. The theory that we developed so far kind of works for one-human-one-machine interaction. There are complications that come up when you have one machine and multiple humans, or multiple machines and multiple humans? For example, if you’ve got one machine and two humans, right, what incentive does each human have to be honest about their preferences? So in Robert Nozick, there’s this notion of the utility monster, which is someone whose utility-scale rate is much, much more extreme than those of ordinary people. So you know, a raindrop falling on that person’s head. Is it just a massive emotional catastrophe and two raindrops is unbearable.

The other thing we have to do is we actually have to show how to build an AI system using not the Standard Model, but this new model (based on) uncertain preferences. So I think that’s really what we need to do over the next five years, here at Berkeley.

I think the personal assistant, seems to be the right category.  (The personal assistant) needs to be constantly uncertain and therefore deferential.  You need that kind of demonstration system to convince people, like AlphaGo did, that deep reinforcement learning is really cool.

I am sure that I don’t know how it’s going to work out. And we’ll learn new things. Whether it’s about how to represent preferences or how fickle human preferences are. I’ve always thought doing experimental research is a good route to inventing new basic theoretical concepts.

Desmond: Is there really enough time to research alternatives to current AI design?

Dr. Russell: I would say more alarmism might be self-defeating. I also think you’ve got to have an alternative that people can embrace.

I’m writing the fourth edition of my textbook (“Artificial Intelligence, a Modern Approach”) right now. The first three editions all give the standard model and the fourth edition says there was a standard model (for AI) and it’s wrong. And here’s the right way of thinking about this problem. Unfortunately, the next 17 chapters actually still talk about how to do things in the standard model, because we don’t have all of the new ways of doing things.

it’s sort of awkward. I mean, we had to do a new version of the book, because deep learning happened since the last edition. We were always planning to do a new edition but we don’t yet have the content.

One of the things I have come to appreciate over the years of doing research is it’s not really about coming up with ingenious algorithms, it’s actually really about coming up with the right definitions of the problems in the first place. Coming up with the right definition, particularly when you’re talking about a definition that involves interaction with a human, always starts to feel messy, hard to really write down in a nice mathematical framework. We’ve got to figure out stylized forms of interaction with humans that could form the basis of a mathematical framework.

The other thing I’m trying to do is to get more places to do this kind of research. So we (Center for Human Compatible AI – CHAI) have branches at Cornell and Michigan, and we’re adding branches at MIT and Princeton. We’ve got a big grant from the Open Philanthropy Project, which is set up by Dustin Moskowitz, one of the Facebook founders.

Desmond: Will AI develop in different ways in different countries and societies?

Dr. Russell: I imagine that there will be lots and lots of individual AI systems, but there’s no particular reason why one AI system can’t adapt itself to the preferences of people in all kinds of different cultures. Think about Facebook, right? It has members in hundreds of millions in pretty much every country and many different cultures and so on. And it’s building profiles of each of those individual human beings.

And so there’s no reason that you have to have the vegan AI system, separate from the Oreo system, separate from the Hindu AI system. I think that’s an important thing to understand. I’m not proposing that there’s a value system that we should put into the machine. There’s no single human value system. And I don’t want the machine to have any values, other than what it is that humans want.

I think that that leads into this more general question of what, what is the relationship that we’re going to have? Assuming that we have machines that are more intelligent than us, but in some sense, totally subservient. We have no models for that relationship. The closest one that I know of is Jeeves and Bertie Wooster in the P.G. Wodehouse novels.   The butler, Jeeves,  is much, much more intelligent than his master,  Bertie Wooster, the aristocrat. But (Jeeves) is in the background and arranges things just so. (The personal assistant) is a little bit like that.

I’ve been looking in science fiction for models of successful coexistence of machines and humans in the future. Maybe the closest is the Culture novels by Iain Banks. And, But there you can see him struggling with (the fact that)  machines are really way more intelligent than the humans. The humans are still in control. They figured out how to make sure that the machines are only serving the interests of humans. And that’s fine. But there’s a real problem with what do humans do when everything can be provided. They have sort of technical mastery of everything.

There are some things that are interesting for humans to do, like, engaging in contact with newly discovered civilizations and different star systems and so on. So that’s a highly coveted career, and everyone else is kind of goofing off. They try to find things to do, like constructing new space habitats, but there’s no need to do that, because the machines can do it better anyway.