Elon Musk wasn’t wrong about automating the Model 3 assembly line — he was just ahead of his time

Image Credits: Guus Schoonewille/AFP / Getty Images

In 2017, when Tesla announced incredibly ambitious Model 3 production targets of 5,000 Model 3s per week and the beginning of “production hell,” analysts were wary. But Elon Musk insisted he could pull it off, citing hyper-automation — a robotic assembly line — as his secret weapon to increase manufacturing speed and drive down costs. Fast-forward a year and a half and Tesla delivered 91,000 vehicles in Q4 2018. But the ramp-up didn’t come without massive issues and a move away from Musk’s original vision of a highly automated assembly line.

What happened?

Asked why the push toward automation didn’t pan out, Elon’s answer revolved around one major issue: robotic vision, or the software that controls what the assembly line robots can “see” and then do based on that computer vision. Unfortunately, the assembly line robots just couldn’t deal with unexpected orientations of objects like nuts and bolts, or complicated maneuvering between the car frame. Every such issue would cause the assembly line to stop. In the end, it was far easier to substitute humans for robots in many assembly situations.

Today, computer vision (the umbrella term for robotic vision) is everywhere and represents the next frontier of AI technologies and groundbreaking applications across a variety of industries. The advances being made right now by researchers and companies in the space are impressive and represent the missing pieces needed to make Elon Musk’s vision of an automated car assembly line a reality. At its core, these advances will give computers and robots the ability to reliably deal with the vast array of unexpected corner cases — those errant nuts and bolts — that occur in the real world.

A watershed moment in computer vision

Computer vision experienced a watershed moment in 2012 with the application of convolutional neural networks. Since then, it has really picked up steam. Before 2012, computer vision was largely about hand-crafted solutions — basically, algorithms had manually defined rule sets and could mathematically describe features of an image relatively effectively. These were hand-selected and then combined by a computer vision researcher in order to identify a specific object in an image, like a bicycle, a storefront or a face.

The rise of machine learning and advances in artificial neural nets changed all of that, allowing us to develop algorithms using massive amounts of training data that can automatically decipher and learn image features. The net effect of this was twofold: (1) solutions became much more robust (e.g. a face could still be identified as a face, even if it were oriented slightly differently, or in shadow), and (2) the creation of good solutions became reliant upon large amounts of high-quality training data (models learn features based on the training data, so it is critical that the training data is accurate, sufficient in quantity and represents the full diversity of situations the algorithm may later see).

Now in the lab: GANs, unsupervised learning and synthetic data

Next, new approaches like GANs (Generative Adversarial Networks), unsupervised learning and synthetic ground truth offer the potential to substantially reduce both the amount of training data required to develop high-quality computer vision models, as well as the time and effort required to collect the data. With these approaches, networks can actually bootstrap their own learning and identify corner cases and outliers with higher fidelity, far faster. Humans can then evaluate the corner cases to refine solutions and get to a high-quality model much more quickly.

These new approaches are rapidly expanding the envelope of computer vision in terms of applications, robustness and reliability. Not only do they hold the promise to solve Mr. Musk’s manufacturing challenges, but they will also dramatically extend the boundaries in myriad critical applications, some of which are highlighted below:

When looking at these advances, one thing quickly becomes clear: Elon Musk wasn’t wrong. It’s just that his vision (robotic and otherwise) was a year or two away from reality. AI, computer vision and robotics are all nearing a tipping point of accuracy, reliability and efficacy. For Tesla, it means that the next ramp up to “production hell” (likely for the model Y) will see a vastly different assembly line at its Fremont and Shanghai factories — one that will more successfully implement robotics paired with computer vision.

Latest Stories