Collecting genome data is reliable, fast and cheap. Yet, interpreting that data is unreliable, slow, and expensive — when it’s even possible.
Today, genome interpretation is a burgeoning science, but it’s not yet a technology. A stricken patient has their genes sequenced and their mutations identified. But then, it can take a highly trained, and highly paid, expert many hours to make a judgment call on a single unfamiliar mutation.
All too often, the result is no diagnosis, no therapy and gut wrenching uncertainty. The problem is made worse because there are not enough knowledgeable experts to handle the rising tide of genome data, and there never will be — exponential growth in the number of human experts is not a viable option.
Genome interpretation is already a pain point for doctors, hospitals, diagnostic labs, pharmaceutical companies and insurance providers. That means it’s also a pain point for everyday patients and their families, whether they know it or not.
The capability gap between the collection of genome data and the interpretation of it is widening faster than ever. If that gap is allowed to continue growing unabated, it represents a shameful lost opportunity to avoid heartache and struggle for millions of people.
How will computer-aided genome interpretation be used to improve the lives of patients? Dozens of ventures are attempting to answer this question and, when the dust settles, healthcare will look dramatically different than it does now.
There are exciting entrepreneurial opportunities in genome-driven personalized medicine, arising from huge potential value and extreme uncertainties in the five-year perspective. We can think of them as rungs on the ladder of information value.
First Rung: Genetic Data Generation And Secure Data Storage
These entrepreneurial opportunities provide the raw material for genomic medicine: whole genome sequences, exome sequences, gene panels and rich phenotype information such as an individual’s predisposition to disease.
This data can be used to determine the set of mutations that a patient has, compared to a reference genome, or it can be used to determine the mutations that tumor cells have, compared to healthy cells. Large databases form crucial resources that support higher rungs on the ladder.
Good systems for genome interpretation are not yet available.
Examples include the sequencers developed and in development at Illumina, PacBio and Oxford Nanopore, the data storage systems in development at Google Genomics and DNAnexus, and the genotype-phenotype data being generated at 23andMe and Human Longevity.
The uncertainties here mainly involve rapidly dropping costs of genome sequencing and phenotyping technologies on the one hand, and increasing concerns about patient confidentiality on the other.
Second Rung: Data Organization, Brokering And Visualization
The value added here is in sharing and comparing the data of individual patients, as well as integrating diverse kinds of large-scale datasets. Pertinent datasets may be public or private, and may have conditions attached, such as those involving confidentiality, non-competition and complex licensing.
Brokering “data trades” in a technologically streamlined manner is crucial. These opportunities do not produce actionable information, but they provide important support for higher rungs on the ladder.
Examples include NextBio, SolveBio and DNAstack. Here, there is uncertainty in the gain in value that can be achieved by combining and sharing genomic data, since without proper interpretation and without addressing patient confidentiality the data may not be actionable.
Third Rung: Software To Bridge The Genotype-Phenotype Gap
This is the most challenging, yet potentially highest-value, entrepreneurial opportunity. Currently, there is a lack of technologies that can reliably link genotype to phenotype and address the crucial question of how genetic modifications, whether natural or therapeutic, impact molecular and biological processes involved in disease. Bridging this gap would be highly disruptive in several verticals, including genetic testing, drug targeting, patient stratification, precision medicine and insurance.
In a recent study, it was shown that the success rate of drugs at phase three in clinical trials could be doubled when even the most simplistic genome interpretation data is taken into account. Imagine what could be achieved if accurate systems for genome interpretation were broadly available.
Bridging the genotype-phenotype gap is the most difficult challenge on the ladder.
Bridging the genotype-phenotype gap is the most difficult challenge on the ladder, because it addresses a very complex, multi-faceted task.
The genome is a digital recipe book for building cells, written in a language that no human will ever fully understand. Our only window into this tiny, complex world is by high-throughput experiments such as DNA and RNA sequencing, proteomics assays, single-cell experiments and gene editing with CRISPR-Cas9 screens.
Identifying valuable experiments is one way entrepreneurs on this rung can create value, but only if they have the computational know-how to make sense of the data. Machine learning is by far the best technology at our disposal for using such data to discover how the underlying biology works. It will play a crucial role in bridging the genotype-phenotype gap.
For this rung, there is no uncertainty about the transformative nature of the technologies and their value. The uncertainty lies in how successful we can be, from a technological standpoint, in bridging the gap. Do we have enough data? The right type of data? The right machine learning algorithms?
Fourth Rung: Diagnostics, Therapies, Precision Medicine And Insurance
These opportunities derive their value from directly addressing the needs of patients. Going forward, this rung will increasingly benefit from the lower rungs on the ladder, and companies that fail to leverage the full stack of the ladder will be left behind. Currently, companies on the fourth rung struggle to make full use of genomic data because good systems for genome interpretation are not yet available.
For instance, the reliability of the current generation of computational tools for genome interpretation is unclear, according to the American College of Medical Genetics and Genomics, the widely accepted oversight body. This will inevitably change as systems for genome interpretation improve and are proven.
Examples of diagnostic companies include Counsyl, Invitae, Myriad and Human Longevity’s Health Nucleus; examples of pharmaceutical companies that are increasingly using these systems include the big pharmas, plus data-driven companies such as 23andMe and Capella Biosciences. Risks here include the uncertainties involved in obtaining regulatory approval and sidestepping the dreaded 10-year drug development cycle.
A Way Forward
Bridging the genotype-phenotype gap is one of the most important outstanding challenges for which machine learning is truly needed. Facebook, Google and DeepMind have made amazing progress in helping computers catch up to humans in understanding images, speech and language, but humans already do these tasks every day and we excel at them. Genome interpretation is different; not a part of our daily lives, yet, in a sense, more urgent.
The gap between our ability to merely collect genetic information and our ability to interpret it at scale is widening faster than ever. Closing that gap will change the lives of hundreds of millions of people.
Our objective in this industry should be to 10X multiply the scale, speed and, most of all, accuracy of genome interpretation. I believe we can do this in three to five years by accelerating the pace of development in computational methods for genome interpretation, and especially machine learning.
Genome interpretation is a software problem that will require the concerted efforts of genome biologists, machine learning experts and software engineers.