Bio-Hackers Pore Through A Child’s DNA For The Source of A Mysterious Disease

When Max Good was born, it was clear that something was off.

At first, he was a colicky baby. But then he wouldn’t make eye contact, revealing that he was almost 100 percent blind at birth. Occasionally, he would have seizures. Then feeding became so impossible and difficult that his parents, Paul and Janis, eventually had to insert a tube up his nose.

“For his entire life, we’ve been trying to figure out what’s up,” said his father Paul Good. “He’s never been mistaken for a normal, healthy child and we’ve run every test on the under the sun.”

Initially, doctors gave Max a cerebral palsy diagnosis. But cerebral palsy captures a set of symptoms or movement disorders that arise from damage to the motor control centers of the brain. It doesn’t describe an exact cause. It’s like how the word “cancer” actually describes a multitude of ways in which normal cell death and division break down for different kinds of cells.

“I’m going to live for decades with this kid. He’s my son. I have to deal with whatever fate throws at us,” said Paul. “But I don’t like the idea of not knowing what this is.”

So after running tests for years — MRIs, EKGs, metabolic testing, you name it — Paul Good decided to explore Max’s DNA on the recommendation of his brother Otavio.

With sequencing costs falling faster than Moore’s Law would suggest, it’s becoming possible for people to get their entire genomes sequenced for a few thousand dollars. In fact, Illumina, the leading company the in the space, announced a machine a few weeks ago that could bring full sequencing costs down to the symbolic milestone of $1,000. It’s important to stress that full sequencing is different from the tests that well-publicized companies like 23andMe offer, which are SNP (single nucleotide polymorphism) tests that examine only small parts of person’s genome.

The dirt-cheap costs mean that it’s possible for computer scientists and hackers to go and literally sort through gigabytes and gigabytes of a person’s raw genomic data. This code is made up of strings of As, Ts, Cs and Gs, the four nucleobases found within DNA. It opens the next challenge: how do you leverage computer science, statistics and mathematics to make sense of this new flood of data?

“That’s the secret sauce. How do you take all of this stuff and narrow it down to a ranked list of conditions that you can go after?” said Mohammed Rahman, a bioinformatics researcher who once worked for pharmaceutical giant Novartis and partnered with Max’s uncle Otavio Good to parse through the toddler’s data.

To put it lightly, Rahman and Otavio Good are a little bit mad scientist crazy. Otavio Good had a drone deliver a ring to his wedding ceremony, won a DARPA challenge with his friends and wife to build a program that could put shredded pieces of paper back together and made Word Lens, that augmented reality app that uses computer vision to translate texts in other languages on your phone’s screen.

The pair got Max and his parents each exome sequenced. (Exome sequencing is more comprehensive than 23andMe’s SNP tests, but still isn’t as exhaustive as whole genome sequencing.) They paid $7,000 to a company in Southern California called Ambry Genetics to test Max twice and get his parents’ data.

Legally, there were some small hurdles. The Goods live in Maryland, where direct-to-consumer genetic testing is prohibited. So Paul had to drive into Virginia to pick up saliva kits from a friend.

Then Rahman and Otavio Good got back a hard drive full of Max’s data — about 30 gigabytes worth.

“We were super excited about the getting rawest, raw information. We didn’t want them to do any processing. We wanted to do it ourselves,” Rahman said. (Normally, Ambry would include some interpretation on top of the sequencing but Rahman and Otavio Good wanted to go DIY. They’re also doing this as a project out of personal interest, not as a business or commercial service.)

When you get exome sequenced, you don’t get back a long, uninterrupted string of 3 billion base pairs, which is roughly the normal length of the human genome. Max’s sequencing tests cut his DNA into tiny, little strips that were each 130 base pairs long. It then became a big data problem to go and put them all back together. For statistical purposes, Rahman and Otavio Good wanted 30 overlaps to ensure that one strip accurately led to another.

Then the two of them built software that pieced together Max’s strips and then compared his data against that of his parents and the Human Reference Genome. The reference genome is a mosaic of DNA data from volunteers put together by the National Institutes of Health. It’s supposed to be an “average” of sorts and the 38th or latest version was released just before Christmas.

otavio

The problem isn’t as simple as looking for basic differences between Max’s genome and the human reference genome.

“If more than 1 percent of the people have the mutation, then it’s probably harmless. It’s probably not this rare thing,” Otavio Good said.

“Basically we’re looking for a needle in the haystack. We’re filtering things down and throwing out stuff that’s not relevant,” Rahman added.

To try and figure out which one of Max’s mutations might lead to serious diseases, Rahman and Otavio Good have run an additional library against their data. They’re using the Kyoto Encyclopedia of Genes and Genomes, which is a painstakingly curated database of genes and the proteins they code for that’s been supported by the Japanese government for the last two decades.

From that, they’ve generated a list of 100 or so diseases that might match Max’s mutations with the Kyoto database. The most helpful part of the process is how they’ve been able to rule out genetic disorders that don’t match Max’s data.

Originally, they were fixated on a condition called Maple Syrup Urine Disease, which is more prevalent among the Mennonite population and matched Max’s symptoms. But they found it didn’t match Max’s DNA.

Another time, Max’s parents thought that his symptoms matched a disease originating from a mutation on the FoxG1 gene. Janis and Paul immediately sent the link to Otavio.

“Within a matter of hours, we had an e-mail back from them with a screenshot of Max’s DNA showing that his FoxG1 gene was normal,” Paul said. “The concept of that blew me away.”

Right now, the pair are looking at a mutation behind a condition called Carbamoyl Phosphate Synthetase I Deficiency, which occurs in about 1 out of every 800,000 newborns. In Max’s DNA, it looks like a protein involved in the production of this enzyme is unnaturally halted two-thirds of the way through (pictured below). (DNA includes stop and start code instructions for proteins. If a stop code appears too quickly, it can make the resulting protein fold incorrectly.)

cps1

Now that they’ve identified a handful of suspicious mutations, Otavio Good and Rahman have to wait for additional tests to check Max’s ammonia build-ups and to see whether they’ve read his DNA properly.

This is one of the bottlenecks that still makes bioinformatics cumbersome compared to other purely software-based problems.

“One day, we’re ripping through code, and the next day we have to wait months for tests,” Otavio Good said.

From an ethical perspective, the Goods’ quest is fairly different from what got 23andMe in hot water with the Food and Drug Administration late last year. After the FDA raised concerns about 23andMe’s accuracy, the company stopped offering health analysis to consumers who order tests today. (They are working on finding a resolution with the agency though and consumers can still get ancestry analysis.)

In 23andMe’s case, their health analysis product pointed to future conditions that may or may not materialize. That raised concerns that certain patients might overreact if they, for example, tested positive for the breast cancer risk factor BRCA1, (especially if the FDA wasn’t confident in testing accuracy). Would they screen more aggressively or do something extreme like get a prophylactic mastectomy?

In the Goods’ case, Max already has an existing condition that the family is trying to find the cause for. Of course, there is also the chance that Rahman and the Goods could stumble upon some genetic risks for adult onset diseases, which would be a different outcome from what they were originally seeking to answer.

As full genome sequencing costs continue to fall, we’ll likely see more and more companies offering interpretation at lower prices. Just six weeks ago, the FDA gave its first authorization for a next-gen sequencer, the Illumina MiSeqDx. The company has been holding dozens of ‘Understand Your Genome’ events where they are educating and giving doctors and medical professionals full genome sequencing results.

But there will always be individualists who want to take and manipulate data on their own.

“This is a bit out there,” Paul Good said. “But this is just information. We’re not subjecting Max to horrible tests that are going to hurt him. For us, we’re letting science run wild to see what we can learn.”