Scale AI CEO Alex Wang weighs in on software bugs and what will make AV tech good enough

Scale co-founder and CEO Alex Wang joined us at TechCrunch Sessions: Mobility 2021 this week to discuss his company’s role in the autonomous driving industry and how it’s changed in the five years since its founding. Scale helps large and small AV players establish reliable “ground truth” through data annotation and management, and along the way, the standards for what that means have shifted as the industry matures.

Good data is the “good bones” of autonomous driving systems

Even if two algorithms in autonomous driving might be created more or less equal, their real-world performance could vary dramatically based on what they’re consuming in terms of input data. That’s where Scale’s value prop to the industry starts, and Wang explains why:

If you think about a traditional software system, the thing that will separate a good software system from a bad software system is the code, the quality of the code. For an AI system, which all of these self-driving vehicles or autonomous vehicles are, it’s the data that really separates an amazing algorithm from a bad algorithm. And so one thing we saw was that being one of the stewards and shepherds of high-quality data was going to be incredibly important for the industry, and that’s what’s played out. We work with many of the great companies in the space, from Aurora to Nuro to Toyota to General Motors, and our work with all of them is ensuring that they have really a solid data foundation, so they can build the rest of their stacks on top of it. (Time stamp: 06:24)

Building a good data set is like constantly tending a precise and specialized garden.

What we see most of our customers operationalizing and building in is how do they have a process by which they can deal with these edge cases, just like they would deal with software bugs. There’s some triage process where they have to be able to reproduce the problems, and you have to be able to hand it over to the development team to be able to identify what data can we get that’s going to solve this problem and help us deal with it, and keep going in this constant lifecycle of tending and curating a data set — almost like a bonsai tree … It’s about constantly trying to build this incredible data set that’s gonna power a super high-quality algorithm. … Something that’s pretty known in the industry now is it’s garbage in, garbage out. If you have bad data going in, then you’re gonna have unsafe vehicles. (Time stamp: 09:47)

How their offerings have changed over the years

Just because building and maintaining a high-quality data set is a continuous endeavor that’s never really complete, it doesn’t mean that Scale hasn’t extended its expertise into other areas. In fact, Wang said they have branched out quite a bit already and will continue to do so in the future.

Our role has continued to evolve. As we work with our customers, and we solve one problem for them around data and annotational data labeling, it turns out they come to us with other problems that we can then help solve as well. Around data management, we launched a product called Nucleus, and a lot of our customers are thinking a lot about mapping and how to deploy more robust maps. So we built a product, I’m going to announce that probably later this month, but we’re helping to address that problem with our customers. So [there are] tons of different parts of the stack that we’re working with our customers to help address. (Time stamp: 07:16)

Bugs and the infinite variety of coping with real-world driving

Freak events and one-off occurrences might not be a huge deal in the software we typically use on our computers and smartphones, but they definitely affect autonomous vehicle software, and these aren’t bugs you can shrug off.

There’s this video that circulated on Twitter last week, which was somebody driving a Tesla, and they were right behind this truck that had three traffic lights on the back. So that they kept seeing traffic lights, but it was actually that you’re on the street with tons and tons of traffic and there’s just this truck in front of it. I think this is one of the core problems of autonomy: it turns out, there’s a lot of situations that happen, and they’re really rare, and really random and very surprising. But you have to be able to deal with all of them really, really well. And it’s different from software … users just learn to deal with the bugs, like we just learned to deal with the fact that Zoom is buggy, or our Macs are buggy. But that just can’t happen in autonomy, because the safety requirements are so high. (Time stamp: 12:06)

When is good AV tech “good enough”?

Figuring out … the right way to benchmark the technology versus the alternative — which is us all driving — figuring out what that looks like is a really big question for the industry. One thing that’s related to this is, I’m trying to remember the exact stat, but I think it’s something along the lines of like 90% of all accidents are caused by 10% of drivers. So in terms of the overall driving community, most of us are actually quite good drivers and quite safe. But then there’s a small percentage of drivers who you can attribute the vast majority of accidents to, and it’s an interesting question of what the bar for the technology is. Is the base for the technology to be safer than all human drivers, which is a higher bar, or [for it to be] safer than the drivers are actually causing lots of the accidents? (Time stamp: 14:16)