Daphne Koller doesn’t mind hard work. She joined Stanford University’s computer science department in 1995, spending the next 18 years there in a full-time capacity before co-founding the online education giant Coursera, where she spent the following four years and remained co-chairman until last month. Koller then spent a little less than two years at Alphabet’s longevity lab, Calico, as its first chief computing officer.
It was there that Koller was reminded of her passion for applying machine learning to improve human health. She was also reminded of what she doesn’t like, which is wasted effort, something that the drug development industry — slow to understand the power of computational methods for analyzing biological data sets — has been plagued by for years.
In fairness, those computational methods have also gotten a whole lot better more recently. Little wonder that last year, Koller spied the opportunity to start another company, a drug development company called Insitro that has since raised $100 million in Series A funding, including from GV, Andreessen Horowitz and Bezos Expeditions, among others. And notably, the company recently partnered with Gilead Sciences to find medicines to treat a liver disease called nonalcoholic steatohepatitis (NASH) because of all the related human data that Gilead has amassed over time.
Later, Insitro may target even bigger epidemics, including perhaps Alzheimer’s disease or Type 2 diabetes. Certainly, it has reason to feel optimistic about what it can accomplish. As Koller told a group of rapt attendees at an event hosted by this editor a few days ago, “We’re now at a moment in history where a confluence of technologies emerged all at around the same time to allow really large and interesting and disease-relevant data sets to be produced in biology. In parallel, we see . . . machine learning technologies that are able to make sense of that data and come up with novel insights that can hopefully cure disease.”
It all sounds like talk we’ve heard before in recent years, but coming from Koller, one gets the sense that we’re finally getting close, despite the mysteries of human biology. Below are some excerpts from Koller’s interview with journalist Sarah McBride of Bloomberg. You can also watch their conversation below.
On why Insitro struck a partnership with Gilead (beyond that it could prove lucrative, with up to $1 billion in milestones attached to successfully developing targets for NASH):
There are fairly broad categories that our technology is well-suited for. We’re really interested in creating what you might call disease-in-a-dish models — places where diseases are complex, where we really haven’t had a good model system, where typical animal models that have been used [for years, including testing on mice] just aren’t very effective — and creating those “in vitro” models to generate very large amounts of data that can be interpreted using machine learning.
There’s a whole slew of diseases that lend themselves to this type of approach. NASH was one of them, so partly it was the suitability of our technology to this disease, and partly it was that Gilead was just a really good partner for it because they have a whole bunch of human data from some of the clinical trials that have been running [which give us] access to two complementary data sources. One is what happens to the disease in large human cohorts, and one is what happens when you look at what the disease does in vitro, in the dish, then see if we can use what we see in the dish using machine learning to predict what we see in the human.
On how Insitro views data differently than big pharma companies:
Pharma companies say, “We have lots of data.” And you say, “What kinds of data do you have?” And it turns out they have dribs and drab of data, each stored on a separate spreadsheet in someone else’s laptop. There’s metadata that isn’t even recorded. For them, it’s like, “Yeah, I did the experiment and obviously I recorded what I had to because it doesn’t make sense to throw it away,” but they don’t think of it as something you build a company on top of.
We come at it a completely different way. We say, “This is the problem that you’d like to solve. If only we had a model that could tell us the result of this experiment without having to do the experiment, because it’s costly or complicated or even impossible [because it would involve perturbing a living human’s gene].” Well, machine learning has gotten really good at building predictive models if you give it the right data to train the model. So we’re in the business of actually building data for the sole purpose of training machine learning models. We think of [these models] like little crystal balls that would allow you to avoid doing [these more expensive or complicated] experiments.
On the impact of the National Institutes of Health’s “All of Us” research program, which is an effort to gather data from one million or more people living in the U.S. to accelerate research and improve health in part by logging individual differences in lifestyle, environment, and biology:
I would say if anything that the U.S. is a little late to the game on this one. There have been a number of national cohorts that have already been generated in different countries; the two that are currently best developed are in Iceland and in the U.K., but there’s also one in Finland and one in Ireland and even in Estonia, where they’ve taken a large population from within that country and measured their genetics, but also measured a whole lot of properties about those people, including blood biomarkers and urine biomarkers and behavioral aspects and physical aspects and imaging. And so what you have now (in these countries) is a data set that tells you, “Nature perturbed this gene,” and, “We see this effect on the human.”
[In the U.K., specifically, where they started their program five years ago and recruited 500,000 volunteers who agreed to physical and cognitive and blood pressure testing and images of the brain and the abdomen, among other things] it’s an incredibly rich data set [from which] discoveries are coming along on pretty much a weekly basis.
… This is valuable not just primarily for gene therapies but just as a way of identifying targets that actually make a difference, because most drugs that go into clinical trials fail. And by most, I mean 95%. And most drugs fail because they are targeting the wrong things. They are targeting proteins or genes that do not affect the disease they are supposed to affect. The recent, very visible failures of Alzheimer’s drug trials — actually several of them in a row — were almost certainly because the protein they were targeting, called amyloid beta, is just not the right causal factor in the disease.
On what researchers can do now with stem cells that would have been impossible even a few years ago:
[There are now] tools that have enabled the creation of not only large amounts of data but large amounts of biologically relevant data. So we used to do experiments on cancer cell lines . . . but it’s not a very disease-relevant model. Today, we can take a small sample of skin cells and use what’s called the Yamanaka factor, to reprogram those cells to stem cell status, which are the cells that exist effectively in the womb. And those cells are capable of differentiating themselves into neural cells or liver cells or cardiac cells, and those are very disease relevant because they represent human biology; you can take those cells now from patients and from healthy people and see if there are differences in how they appear.
If you use this text as a leaping off point, you’ll want to start listening at around the 13-minute mark. It’s worth the time to hear what she has to say, including about cystic fibrosis, spinal muscular dystrophy in babies, and why the “mouse models” we’ve long relied on for a wide number of seemingly ubiquitous diseases “range from bad to really, really bad.”