Researchers who leverage public data for life-changing medical breakthroughs have long promised that donors can remain anonymous. The truth is, we’re going to have to choose between innovation and privacy, as clever statisticians are discovering ways to identify individuals who never actually reveal their names.
This week, the prestigious journal Science published a paper detailing how researchers were able to identify 50 male participants who had anonymously donated their DNA for scientific research to a public database by cross-referencing their genetic information with data from their relatives and other public demographic information.
“We have been pretending that by removing enough information from databases that we can make people anonymous. We have been promising privacy, and this paper demonstrates that for a certain percent of a population, those promises are empty,” said John Wilbanks of Sage Bionetworks, a nonprofit that advocates for open medical data (but was not involved in the study).
The Wall Street Journal made this user-friendly graphic to explain how the researchers uncovered the private data (below). In essence, altruistic citizens had donated genetic material to a public research project, the 1000 Genomes Project. Using a computer algorithm, the researchers correlated information from the subject’s Y chromosome with other genealogy databases, and matched it to their last names. Or, in the words of the researchers, “Surnames are paternally inherited in most human societies, resulting in their co-segregation with Y chromosome haplotypes.”
After combing everything from obituaries to other public genetic databases, the researchers believe they can identify up to 12 percent of U.S. caucasian males*.
The trade-off between privacy and medical breakthroughs could not be more serious. The Info/Law Harvard Privacy blog surmised that opening up private healthcare databases could have saved us from 90,000 unnecessary heart attacks and 25,000 deaths. FDA drug risk researcher Dr. Richard Platt admits:
The Vioxx debacle is a haunting illustration of the importance of large-scale data research. If researchers had had access to 7 million longitudinal patient record, a statistically significant relationship between Vioxx and heart attack would have been revealed in under three years. If researchers had had access to 100 million longitudinal patient records, the relationship would have been discovered in just three months.
The Health Insurance Portability and Accountability Act (HIPAA) mandates certain privacy laws that have come under fire from open data advocacy groups. This latest discovery will make HIPAA reform all the more difficult.
In the near term, we’re likely going to have to choose between opening up medical records and donating medical information or privacy. Personally, I would find it difficult to look someone in the eye, whose life I know I could save, and tell them my privacy was more important. To be sure, that is the moral hypothetical that we must all ask ourselves. It is why, even though I have my own skeletons/embarrassing secrets, I tend to support a more transparent approach to science.
*Why Caucasians? “Our results are presumably mostly relevant to socio-economic groups with high participation in these services—namely, upper- and middle-class U.S. Caucasians,” write the researchers. Namely, donating genetic information is kind of a yuppie thing.
[Image Credit: Flickr User Alan Clever]