Featured Article

Embodied AI, superintelligence and the master algorithm

What will take us from potential to reality in the next 18 months?

Comment

BARCELONA, SPAIN - JUNE 29: Boston dynamics Spot robot, sowed during the second day of Mobile World Congress (MWC) Barcelona, on June 29, 2021 in Barcelona, Spain. (Photo by Joan Cros/Corbis via Getty Images)
Image Credits: Joan Cros Garcia-Corbis (opens in a new window) / Getty Images

Chris Nicholson

Contributor

Chris Nicholson is the founder and CEO of Pathmind, a company applying deep reinforcement learning to industrial operations and supply chains.

More posts from Chris Nicholson

Superintelligence, roughly defined as an AI algorithm that can solve all problems better than people, will be a watershed for humanity and tech.

Even the best human experts have trouble making predictions about highly probabilistic, wicked problems. And yet those wicked problems surround us. We are all living through immense change in complex systems that impact the climate, public health, geopolitics and basic needs served by the supply chain.

Just determining the best way to distribute COVID-19 vaccines without the help of an algorithm is practically impossible. We need to get smarter in how we solve these problems — fast.

Superintelligence, if achieved, would help us make better predictions about challenges like natural disasters, building resilient supply chains or geopolitical conflict, and come up with better strategies to solve them. The last decade has shown how much AI can improve the accuracy of our predictions. That’s why there is an international race among corporations and governments around superintelligence.

Highly credible think tanks like Deepmind and OpenAI say that the path to superintelligence is visible. Last month, Deepmind said reinforcement learning (RL) could get us there, and RL is at the heart of embodied AI.

What is embodied AI?

Embodied AI is AI that controls a physical “thing,” like a robot arm or an autonomous vehicle. It is able to move through the world and affect a physical environment with its actions, similar to the way a person does. In contrast, most predictive models live in the cloud doing things such as classifying text or images, steering flows of bits without ever moving a body through three-dimensional space.

For those who work in software, including AI researchers, it is too easy to forget the body. But any superintelligent algorithm needs to control a body because so many of the problems we confront as humans are physical. Firestorms, coronaviruses and supply chain breakdowns need solutions that aren’t just digital.

All the crazy Boston Dynamics videos of robots jumping, dancing, balancing and running are examples of embodied AI. They show how far we’ve come from early breakthroughs in dynamic robot balancing made by Trevor Blackwell and Anybots more than a decade ago. The field is moving fast and, in this revolution, you can dance.

What’s blocked embodied AI up until now?

Challenge 1: One of the challenges when controlling machines with AI is the high dimensionality of the world — the sheer range of things that can come at you.

What does high dimensionality mean? One way to think about it is: It’s the number of signals you have to pay attention to in order to get what you want. In chess, you only have to pay attention to the pieces on the board. That’s relatively low dimensionality. The weather doesn’t matter. The pieces are fixed. The pawns will not sprout wings and fly.

But what if you are building agricultural robots to solve a farming problem? You have to pay attention to what’s happening in the field. But you also have to know how to respond to a thousand different weather types and keep in mind that some things, like locusts, do sprout wings before they come for you. If we cannot solve the basic problem of producing food for ourselves, we will not make a superintelligence. Embodied AI is the gate it must pass through.

Challenge 2: It’s hard to know what worked. Sometimes our environment only reveals the consequences of our choices many years later. Like people, AI agents don’t learn as well if they lack feedback, which is the case when you can’t see the results of what you do. Humans have developed culture, principles and platitudes to pass long-term lessons down to short-lived individuals. Robots don’t have culture outside their software. So how do they learn the long-term effects of decisions made in a fast-paced environment?

Challenge 3: It’s hard to learn how to handle situations that have never occurred before. To navigate the evolving present, we have to extract principles and theories from our experience about how the world works and apply those theories to new situations. A lot of machine learning is constrained to learn from the narrow corridor of history, i.e., the data that was collected. That is why machine learning algorithms that learn from the past are doomed to repeat it.

But we are much closer to making embodied AI work

There are three positive indicators that we are on the road to making embodied AI work: advances in deep reinforcement learning, more compute on the edge, and AI training based on simulations and historical data.

Advances in deep RL

Advances in deep RL, a family of AI algorithms that is often used in robotics, have moved us much closer to solving embodied AI because deep RL solves sequential decision-making problems where long moments separate actions from the outcomes they produce.

In the last few years, the advances in RL have been staggering thanks to think tanks such as DeepMind and OpenAI, robotics-focused startups like Covariant, and AI-driven companies like Tesla.

Kodiak Robotics’ founder says tight focus on autonomous trucks is working

When Peter Thiel and Elon Musk backed DeepMind it was still working on Atari games, which it beat with an RL algorithm called deep Q learning. That work would ultimately lead to Google’s acquisition of the company, as well as Musk’s subsequent initiatives to warn the world of the dangers of AI, and his creation of OpenAI with Sam Altman.

Researchers such as Pieter Abbeel, Peter Chen and Rocky Duan, after spending time at OpenAI, would go on to found Covariant (formerly known as “Embodied Intelligence”) and use deep RL to solve some of the hardest problems in robotics, which they are now bringing to manufacturing and industrial control.

DeepMind’s claims about superintelligence in May, and its recent achievements in protein folding with AlphaFold, are more signals that deep RL is the most promising sector of AI right now.

Faster compute

In addition to algorithmic breakthroughs, embodied AI is benefiting from faster, cheaper and more compact compute on the edge, as companies like Nvidia, AMD, Qualcomm and Intel are making it possible to process more information from sensors locally to keep latency low and ensure a fast response.

Simulation training

Finally, one underreported aspect of reinforcement learning is that it is mostly trained in simulations. That is, it escapes the narrow corridor of history and trains on what might happen in the future. All those video games? Simulations. Protein folding? A molecular simulation. Simulations allow these algorithms to live a thousand lives, experiencing things we have never seen. That’s the real secret: They don’t just contain more intelligence, they contain it because they have lived longer than us, using parallel compute.

What will take us from potential to reality in the next 18 months?

A few years ago, prominent researchers were still claiming that RL “didn’t work.” We have seen case after case where it does. Things are moving fast. What the 1990s was for the internet, the 2020s will be for AI and robotics.

We already have the algorithms and the compute for those systems. Now we need the data and the bandwidth. In the next 18 months, we see advances on three fronts:

  1. Deep RL needs to get the right observations from the world, and IoT data is what it will use to do so. To get that data, machines have to be wired with sensors. In a sense, deep RL is the real AI of Things, or AIoT. Companies like PTC, Siemens, ABB and Rockwell Automation are helping the major players of the manufacturing industry wire up their physical plants and gather the data they need to monitor their operations. A lot of that is what they call Industrial IoT, or IIoT. Unicorns such as Samsara have developed a single pane of glass to track that data.
  2. Deep RL needs higher bandwidth to move data and decisions between the sensors and the compute in the cloud. That is the purpose of 5G, and we will increasingly see private 5G networks implemented in manufacturing logistics facilities that rely more and more on robots to get the work done. Those 5G implementations are on the way.
  3. As RL agents gain more experience both from simulations and embedded experience in machines themselves, such as autonomous vehicles and robot arms, they will perform better and better against benchmarks. That flywheel of performance will accelerate a broader replatforming of IoT workloads to the clouds.

In the next year and a half, we’re going to see increasing adoption of these technologies, which will trigger a broader industry shift, much as Tesla triggered the transition to EVs. By exposing AI to more real-world data and challenges, by getting it into more robot bodies, we can accelerate both the digital transformation of industry and the intelligence gains of AI itself.

How we built an AI unicorn in 6 years

More TechCrunch

Looking Glass makes trippy-looking mixed-reality screens that make things look 3D without the need of special glasses. Today, it launches a pair of new displays, including a 16-inch mode that…

Looking Glass launches new 3D displays

Replacing Sutskever is Jakub Pachocki, OpenAI’s director of research.

Ilya Sutskever, OpenAI co-founder and longtime chief scientist, departs

Intuitive Machines made history when it became the first private company to land a spacecraft on the moon, so it makes sense to adapt that tech for Mars.

Intuitive Machines wants to help NASA return samples from Mars

As Google revamps itself for the AI era, offering AI overviews within its search results, the company is introducing a new way to filter for just text-based links. With the…

Google adds ‘Web’ search filter for showing old-school text links as AI rolls out

Blue Origin’s New Shepard rocket will take a crew to suborbital space for the first time in nearly two years later this month, the company announced on Tuesday.  The NS-25…

Blue Origin to resume crewed New Shepard launches on May 19

This will enable developers to use the on-device model to power their own AI features.

Google is building its Gemini Nano AI model into Chrome on the desktop

It ran 110 minutes, but Google managed to reference AI a whopping 121 times during Google I/O 2024 (by its own count). CEO Sundar Pichai referenced the figure to wrap…

Google mentioned ‘AI’ 120+ times during its I/O keynote

Firebase Genkit is an open source framework that enables developers to quickly build AI into new and existing applications.

Google launches Firebase Genkit, a new open source framework for building AI-powered apps

In the coming months, Google says it will open up the Gemini Nano model to more developers.

Patreon and Grammarly are already experimenting with Gemini Nano, says Google

As part of the update, Reddit also launched a dedicated AMA tab within the web post composer.

Reddit introduces new tools for ‘Ask Me Anything,’ its Q&A feature

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

LearnLM is already powering features across Google products, including in YouTube, Google’s Gemini apps, Google Search and Google Classroom.

LearnLM is Google’s new family of AI models for education

The official launch comes almost a year after YouTube began experimenting with AI-generated quizzes on its mobile app. 

Google is bringing AI-generated quizzes to academic videos on YouTube

Around 550 employees across autonomous vehicle company Motional have been laid off, according to information taken from WARN notice filings and sources at the company.  Earlier this week, TechCrunch reported…

Motional cut about 550 employees, around 40%, in recent restructuring, sources say

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: Watch all of the AI, Android reveals

Google Play has a new discovery feature for apps, new ways to acquire users, updates to Play Points, and other enhancements to developer-facing tools.

Google Play preps a new full-screen app discovery feature and adds more developer tools

Soon, Android users will be able to drag and drop AI-generated images directly into their Gmail, Google Messages and other apps.

Gemini on Android becomes more capable and works with Gmail, Messages, YouTube and more

Veo can capture different visual and cinematic styles, including shots of landscapes and timelapses, and make edits and adjustments to already-generated footage.

Google Veo, a serious swing at AI-generated video, debuts at Google I/O 2024

In addition to the body of the emails themselves, the feature will also be able to analyze attachments, like PDFs.

Gemini comes to Gmail to summarize, draft emails, and more

The summaries are created based on Gemini’s analysis of insights from Google Maps’ community of more than 300 million contributors.

Google is bringing Gemini capabilities to Google Maps Platform

Google says that over 100,000 developers already tried the service.

Project IDX, Google’s next-gen IDE, is now in open beta

The system effectively listens for “conversation patterns commonly associated with scams” in-real time. 

Google will use Gemini to detect scams during calls

The standard Gemma models were only available in 2 billion and 7 billion parameter versions, making this quite a step up.

Google announces Gemma 2, a 27B-parameter version of its open model, launching in June

This is a great example of a company using generative AI to open its software to more users.

Google TalkBack will use Gemini to describe images for blind people

Google’s Circle to Search feature will now be able to solve more complex problems across psychics and math word problems. 

Circle to Search is now a better homework helper

People can now search using a video they upload combined with a text query to get an AI overview of the answers they need.

Google experiments with using video to search, thanks to Gemini AI

A search results page based on generative AI as its ranking mechanism will have wide-reaching consequences for online publishers.

Google will soon start using GenAI to organize some search results pages

Google has built a custom Gemini model for search to combine real-time information, Google’s ranking, long context and multimodal features.

Google is adding more AI to its search results

At its Google I/O developer conference, Google on Tuesday announced the next generation of its Tensor Processing Units (TPU) AI chips.

Google’s next-gen TPUs promise a 4.7x performance boost

Google is upgrading Gemini, its AI-powered chatbot, with features aimed at making the experience more ambient and contextually useful.

Google’s Gemini updates: How Project Astra is powering some of I/O’s big reveals