DeepMind claims early progress in AI-based predictive protein modelling

Google-owned AI specialist, DeepMind, has claimed a “significant milestone” in being able to demonstrate the usefulness of artificial intelligence to help with the complex task of predicting 3D structures of proteins based solely on their genetic sequence.

Understanding protein structures is important in disease diagnosis and treatment, and could improve scientists’ understanding of the human body — as well as potentially helping to support protein design and bioengineering.

Writing in a blog post about the project to use AI to predict how proteins fold — now two years in — it writes: “The 3D models of proteins that AlphaFold [DeepMind’s AI] generates are far more accurate than any that have come before — making significant progress on one of the core challenges in biology.”

There are various scientific methods for predicting the native 3D state of protein molecules (i.e. how the protein chain folds to arrive at the native state) from residual amino acids in DNA.

But modelling the 3D structure is a highly complex task, given how many permutations there can be on account of protein folding being dependent on factors such as interactions between amino acids.

There’s even a crowdsourced game (FoldIt) that tries to leverage human intuition to predict workable protein forms.

DeepMind says its approach rests upon years of prior research in using big data to try to predict protein structures.

Specifically it’s applying deep learning approaches to genomic data.

“Fortunately, the field of genomics is quite rich in data thanks to the rapid reduction in the cost of genetic sequencing. As a result, deep learning approaches to the prediction problem that rely on genomic data have become increasingly popular in the last few years. DeepMind’s work on this problem resulted in AlphaFold, which we submitted to CASP [Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction] this year,” it writes in the blog post.

“We’re proud to be part of what the CASP organisers have called “unprecedented progress in the ability of computational methods to predict protein structure,” placing first in rankings among the teams that entered (our entry is A7D).”

“Our team focused specifically on the hard problem of modelling target shapes from scratch, without using previously solved proteins as templates. We achieved a high degree of accuracy when predicting the physical properties of a protein structure, and then used two distinct methods to construct predictions of full protein structures,” it adds.

DeepMind says the two methods it used relied on using deep neural networks trained to predict protein properties from its genetic sequence.

“The properties our networks predict are: (a) the distances between pairs of amino acids and (b) the angles between chemical bonds that connect those amino acids. The first development is an advance on commonly used techniques that estimate whether pairs of amino acids are near each other,” it explains.

“We trained a neural network to predict a separate distribution of distances between every pair of residues in a protein. These probabilities were then combined into a score that estimates how accurate a proposed protein structure is. We also trained a separate neural network that uses all distances in aggregate to estimate how close the proposed structure is to the right answer.”

It then used new methods to try to construct predictions of protein structures, searching known structures that matched its predictions.

“Our first method built on techniques commonly used in structural biology, and repeatedly replaced pieces of a protein structure with new protein fragments. We trained a generative neural network to invent new fragments, which were used to continually improve the score of the proposed protein structure,” it writes.

“The second method optimised scores through gradient descent — a mathematical technique commonly used in machine learning for making small, incremental improvements — which resulted in highly accurate structures. This technique was applied to entire protein chains rather than to pieces that must be folded separately before being assembled, reducing the complexity of the prediction process.”

DeepMind describes the results achieved thus far as “early signs of progress in protein folding” using computational methods — claiming they demonstrate “the utility of AI for scientific discovery”.

Though it also emphasizes it’s still early days for the deep learning approach having any kind of “quantifiable impact”.

“Even though there’s a lot more work to do before we’re able to have a quantifiable impact on treating diseases, managing the environment, and more, we know the potential is enormous,” it writes. “With a dedicated team focused on delving into how machine learning can advance the world of science, we’re looking forward to seeing the many ways our technology can make a difference.”