Perceptron: AI that sees with sound, learns to walk and predicts seismic physics

Research in the field of machine learning and AI, now a key technology in practically every industry and company, is far too voluminous for anyone to read it all. This column, Perceptron, aims to collect some of the most relevant recent discoveries and papers — particularly in, but not limited to, artificial intelligence — and explain why they matter.

This month, engineers at Meta detailed two recent innovations from the depths of the company’s research labs: an AI system that compresses audio files and an algorithm that can accelerate protein-folding AI performance by 60x. Elsewhere, scientists at MIT revealed that they’re using spatial acoustic information to help machines better envision their environments, simulating how a listener would hear a sound from any point in a room.

Meta’s compression work doesn’t exactly reach unexplored territory. Last year, Google announced Lyra, a neural audio codec trained to compress low-bitrate speech. But Meta claims that its system is the first to work for CD-quality, stereo audio, making it useful for commercial applications like voice calls.

Meta audio compression

An architectural drawing of Meta’s AI audio compression model. Image Credits: Meta

Using AI, Meta’s compression system, called Encodec, can compress and decompress audio in real time on a single CPU core at rates of around 1.5 kbps to 12 kbps. Compared to MP3, Encodec can achieve a roughly 10x compression rate at 64 kbps without a perceptible loss in quality.

The researchers behind Encodec say that human evaluators preferred the quality of audio processed by Encodec versus Lyra-processed audio, suggesting that Encodec could eventually be used to deliver better-quality audio in situations where bandwidth is constrained or at a premium.

As for Meta’s protein folding work, it has less immediate commercial potential. But it could lay the groundwork for important scientific research in the field of biology.

Meta protein folding

Protein structures predicted by Meta’s system. Image Credits: Meta

Meta says its AI system, ESMFold, predicted the structures of around 600 million proteins from bacteria, viruses and other microbes that haven’t yet been characterized. That’s more than triple the 220 million structures that Alphabet-backed DeepMind managed to predict earlier this year, which covered nearly every protein from known organisms in DNA databases.

Meta’s system isn’t as accurate as DeepMind’s. Of the ~600 million proteins it generated, only a third were “high quality.” But it’s 60 times faster at predicting structures, enabling it to scale structure prediction to much larger databases of proteins.

Not to give Meta outsize attention, the company’s AI division also this month detailed a system designed to mathematically reason. Researchers at the company say that their “neural problem solver” learned from a dataset of successful mathematical proofs to generalize to new, different kinds of problems.

Meta isn’t the first to build such a system. OpenAI developed its own, called Lean, that it announced in February. Separately, DeepMind has experimented with systems that can solve challenging mathematical problems in the studies of symmetries and knots. But Meta claims that its neural problem solver was able to solve five times more International Math Olympiad than any previous AI system and bested other systems on widely used math benchmarks.

Meta notes that math-solving AI could benefit the the fields of software verification, cryptography and even aerospace.

Turning our attention to MIT’s work, research scientists there developed a machine learning model that can capture how sounds in a room will propagate through the space. By modeling the acoustics, the system can learn a room’s geometry from sound recordings, which can then be used to build visual renderings of a room.

The researchers say the tech could be applied to virtual and augmented reality software or robots that have to navigate complex environments. In the future, they plan to enhance the system so that it can generalize to new and larger scenes, such as entire buildings or even whole towns and cities.

At Berkeley’s robotics department, two separate teams are accelerating the rate at which a quadrupedal robot can learn to walk and do other tricks. One team looked to combine the best-of-breed work out of numerous other advances in reinforcement learning to allow a robot to go from blank slate to robust walking on uncertain terrain in just 20 minutes real-time.

“Perhaps surprisingly, we find that with several careful design decisions in terms of the task setup and algorithm implementation, it is possible for a quadrupedal robot to learn to walk from scratch with deep RL in under 20 minutes, across a range of different environments and surface types. Crucially, this does not require novel algorithmic components or any other unexpected innovation,” write the researchers.

Instead, they select and combine some state-of-the-art approaches and get amazing results. You can read the paper here.

Robot dog demo from EECS professor Pieter Abbeel’s lab in Berkeley, California in 2022. (Photo courtesy Philipp Wu/Berkeley Engineering)

Another locomotion learning project, from (TechCrunch’s pal) Pieter Abbeel’s lab, was described as “training an imagination.” They set up the robot with the ability to attempt predictions of how its actions will work out, and though it starts out pretty helpless, it quickly gains more knowledge about the world and how it works. This leads to a better prediction process, which leads to better knowledge, and so on in feedback until it’s walking in less than an hour. It learns just as quickly to recover from being pushed or otherwise “purturbed,” as the lingo has it. Their work is documented here.

Work with a potentially more immediate application came earlier this month out of Los Alamos National Laboratory, where researchers developed a machine learning technique to predict the friction that occurs during earthquakes — providing a way to forecast earthquakes. Using a language model, the team says that they were able to analyze the statistical features of seismic signals emitted from a fault in a laboratory earthquake machine to project the timing of a next quake.

“The model is not constrained with physics, but it predicts the physics, the actual behavior of the system,” said Chris Johnson, one of the research leads on the project. “Now we are making a future prediction from past data, which is beyond describing the instantaneous state of the system.”

Dreamstime

Image Credits: Dreamstime

It’s challenging to apply the technique in the real world, the researchers say, because it’s not clear whether there’s sufficient data to train the forecasting system. But all the same, they’re optimistic about the applications, which could include anticipating damage to bridges and other structures.

Last this week is a note of caution from MIT researchers, who warn that neural networks being used to simulate actual neural networks should be carefully examined for training bias.

Neural networks are of course based on the way our own brains process and signal information, reinforcing certain connections and combinations of nodes. But that doesn’t mean that the synthetic and real ones work the same. In fact, the MIT team found, neural network-based simulations of grid cells (part of the nervous system) only produced similar activity when they were carefully constrained to do so by their creators. If allowed to govern themselves, the way the actual cells do, they didn’t produce the desired behavior.

That doesn’t mean deep learning models are useless in this domain — far from it, they’re very valuable. But, as professor Ila Fiete said in the school’s news post: “they can be a powerful tool, but one has to be very circumspect in interpreting them and in determining whether they are truly making de novo predictions, or even shedding light on what it is that the brain is optimizing.”