AI is ready to take on a massive healthcare challenge

Which disease results in the highest total economic burden per annum? If you guessed diabetes, cancer, heart disease or even obesity, you guessed wrong. Reaching a mammoth financial burden of $966 billion in 2019, the cost of rare diseases far outpaced diabetes ($327 billion), cancer ($174 billion), heart disease ($214 billion) and other chronic diseases.

Cognitive intelligence, or cognitive computing solutions, blend artificial intelligence technologies like neural networks, machine learning, and natural language processing, and are able to mimic human intelligence.

It’s not surprising that rare diseases didn’t come to mind. By definition, a rare disease affects fewer than 200,000 people. However, collectively, there are thousands of rare diseases and those affect around 400 million people worldwide. About half of rare disease patients are children, and the typical patient, young or old, weather a diagnostic odyssey lasting five years or more during which they undergo countless tests and see numerous specialists before ultimately receiving a diagnosis.

No longer a moonshot challenge

Shortening that diagnostic odyssey and reducing the associated costs was, until recently, a moonshot challenge, but is now within reach. About 80% of rare diseases are genetic, and technology and AI advances are combining to make genetic testing widely accessible.

Whole-genome sequencing, an advanced genetic test that allows us to examine the entire human DNA, now costs under $1,000, and market leader Illumina is targeting a $100 genome in the near future.

The remaining challenge is interpreting that data in the context of human health, which is not a trivial challenge. The typical human contains 5 million unique genetic variants and of those we need to identify a single disease-causing variant. Recent advances in cognitive AI allow us to interrogate a person’s whole genome sequence and identify disease-causing mechanisms automatically, augmenting human capacity.

A shift from narrow to cognitive AI

The path to a broadly usable AI solution required a paradigm shift from narrow to broader machine learning models. Scientists interpreting genomic data review thousands of data points, collected from different sources, in different formats.

An analysis of a human genome can take as long as eight hours, and there are only a few thousand qualified scientists worldwide. When we reach the $100 genome, analysts are expecting 50 million-60 million people will have their DNA sequenced every year. How will we analyze the data generated in the context of their health? That’s where cognitive intelligence comes in.

Cognitive intelligence, or cognitive computing solutions, blend artificial intelligence technologies like neural networks, machine learning, and natural language processing, and are able to mimic human intelligence. These AI models are capable of making decisions in complex and context rich situations.

Most artificial models we use today are narrow models, designed to perform a single task. Amazon uses a narrow AI model to determine the freshness of produce before shipping it out. This is a highly useful model for the specific purpose of increasing customer satisfaction and reducing waste.

However, using narrow AI to solve the problem of interpreting genomic data is close to impossible. The analysis task performed by humans is quite complex. For one thing, it’s highly context dependent and has dozens of variable input parameters. Another problem is that every patient case requires gathering scientific evidence, and this evidence can reside in recent publications as well as in dozens of databases, and is an additional time-consuming task that requires a machine learning solution.

Adding to the complexity, clinical evidence resides in multiple sources. One particularly challenging case analyzed on the Emedgene platform involved a 19-year-old patient with retinal degeneration. Following a negative first review by a clinical lab, the cognitive AI model identified variants in a gene, POC5, which is part of the centriolar protein gene family.

Two genes in the family CEP290 and POC1B are associated with retinitis pigmentosa, the most common form of inherited retinal degeneration, which led the cognitive AI to suggest POC5 as a candidate. It connected variants, gene and patient phenotypes with other known genes, a task that would take a human many hours to complete. Subsequent animal studies confirmed that mutations in POC5 result in decreased visual motor response.

Narrow artificial intelligence models would be difficult to train to address context-rich patient cases with multiple steps in the workflow. We could take a “brute force” approach and throw very large data sets at these problems. But as always in medicine, patient data is siloed, and any organization attempting to train a model would be hard pressed to access large data sets today even if we assume enough patients have been sequenced to date.

If taking a supervised learning approach, annotating scientific data is also quite challenging at scale, as the same biomedical professionals we need to analyze genomic data would be the ones most qualified for annotation.

Then we have the additional challenges of finding and organizing the genomic data needed for interpretation, which would require a different set of machine learning technologies entirely. Just to illustrate how challenging this problem is, let’s take a quick look at the M protein. There’s the Protein M, as expected, but M protein is also the virulence factor produced by strep, and Myeloma protein is also referred to sometimes as M protein.

Finally, MYOM2, the gene that encodes Protein M, is also sometimes referred to as Protein M. Entity disambiguation in the biomedical literature is difficult. Add the challenge of identifying positive, negative or other types of relationships between entities, adopting hierarchical ontologies of medical terms, and you come to a large and complex challenge on its own.

When facing the complex challenge of scaling genomic interpretation, cognitive AI systems seem to be the solution. A combination of machine learning algorithms that in concert can perform genomic analysis, find and organize scientific and genomic data, and provide evidence-backed interpretation of patient genomic data can reduce the manual labor involved in interpreting genomic data, enabling scale in precision medicine programs.

Deloitte’s tech trends report puts AI technologies that predict, prescribe, augment and automate firmly in the disruptive category. They recommend using cognitive technology to illuminate insights and connections among disparate data, especially in areas where human decision-making is nonscalable. The global cognitive computing market is expected to reach $72.26 billion by 2027, according to Fortune Business Insights.

No self-driving algorithms

In contrast to self-driving cars, machine learning models aren’t expected to diagnose patients independently any time in the near future. They are intended to help strained precision medicine programs cope with a growing patient load.

We can use cognitive models to deliver an accurate first line analysis of human DNA in the context of clinical care, subject to human oversight, and still reduce analysis time significantly. This transition decouples genomic medicine growth from the need for more biomedical professionals.

From rare to common

While this combination of technologies allows us to speed up diagnostics for the large rare-disease patient population, it can also be repurposed to improve our understanding, and treatment, of common diseases. Until now, most discoveries of rare variation that has implications for common diseases have been serendipitous.

For example, the discovery that individuals with loss-of-function mutations in the PCSK9 gene exhibit low levels of serum LDL and abnormally good cardiovascular health led to the development of PCSK9 inhibitors, which are more effective than statins at lowering serum cholesterol levels. However, the same technology that allows us to implement precision medicine at scale for rare disease diagnosis can be extended to identifying the genetics underlying wished-for drug responses.

Now that we can transform rare disease diagnostics, we should. These tech advances — both in sequencing and artificial intelligence — allow us to embed genomic medicine in every hospital and health system. More than 400 million rare-disease patients are waiting, and countless patients with common diseases could benefit.