Empowering a new wave of health tech startups — with data

Welcome to The TechCrunch Exchange, a weekly startups-and-markets newsletter. It’s inspired by the daily TechCrunch+ column where it gets its name. Want it in your inbox every Saturday? Sign up here.

Spotting new trends is one of my favorite parts of my job. But I like it even more when several trends converge into a transformative wave. That’s exactly what’s happening in health tech right now, as the sector benefits not only from the rise of open data, but also from the democratization of data analytics and privacy-preserving synthetic data. Let’s explore. — Anna

Data for health tech builders

When I heard that synthetic healthcare data startup Syntegra would release free datasets with realistic-but-fake patient data, it caught my attention. Not because it was releasing open data, not because it was sharing synthetic data and not because it’s a health tech company — but because it was doing all of those things at once.

You wouldn’t be wrong to think open data is cool in itself. For instance, I loved hearing earlier this week about the launch of BigScience’s large language model, BLOOM, a free and open alternative to OpenAI’s GPT-3. But the expectation is that BLOOM will mostly be used by researchers. In contrast, the datasets made available by Syntegra and Tuva Health are meant for health tech startups.

“For too long, health tech builders haven’t been able to build their tech stack or product because they don’t have access to realistic healthcare data on their timelines and in their development environments,” Syntegra wrote on the five datasets’ download page.

In an interview with TechCrunch, the synthetic data company confirmed that its target audience for this project is health tech startups and data scientists. The latter are worth mentioning, because the domain knowledge required to access the data is lower than I thought.

Indeed, Syntegra’s datasets are released in partnership with Tuva Health, a startup that was part of Y Combinator’s Winter 2022 batch that makes “open source software that cleans and transforms messy healthcare data.” Thanks to Tuva’s tools, Syntegra’s cleaned and normalized datasets are available not only in common healthcare data formats I had never heard of, but also in an analytics-ready format.

Dhruv Vasishtha, an angel investor in Tuva Health, commented on the broader trend at play. “Just as there has been a rise of purpose-built healthcare SaaS,” he wrote on Twitter, “we’re seeing a small but mighty explosion of healthcare data infra accelerating time to patient impact.”

A previous lack of commoditized data infrastructure, Vasishtha added, meant requiring “young value-based organizations, especially tech-enabled startups, to invest capital and time into re-building commodity infrastructure only delaying patient impact and taking away precious resources.”

“This is where Tuva comes in,” Vasishtha said. Another investor in Tuva Health, Nikhil Krishnan, highlighted the fact that the startup “is bringing dbt to healthcare.” A quick search officially confirmed that “several of the repositories in the Tuva Project are in fact dbt projects.”

Its parent company, dbt Labs, describes dbt as a data transformation tool, but more importantly, it is making data analytics accessible to a much wider range of users. This is joining the dots with another trend I love: data democratization.

Balancing data access and privacy

Better ways to parse data are great, but they would be nothing without data access in the first place. This is particularly challenging in the healthcare space, where data is typically safeguarded — and therefore out of reach, even for use cases that would ultimately benefit patients.

However, some startups are already working on improving healthcare data access. For instance, this very column already wondered whether data can fix healthcare. To answer this question, Alex talked to Truveta, a young company started by former Microsoft executives.

Truveta, in a previous TechCrunch article, explained that it “wants to collect privacy-safe medical data from around the United States on a regular basis.” This sounds like a great mission, but also a very ambitious one. Wouldn’t be it easier to generate synthetic data?

When we looked into the synthetic data space recently, we heard that healthcare was one of the main use cases for this type of data for three reasons: because it emulates real data, but purportedly without endangering privacy; because it might be able to simulate edge cases that are hard to encounter at scale; and last but not least, because it can be plentiful, while still of quality.

For all these reasons, synthetic data is particularly promising for the health sector, Syntegra founder and CEO Michael D. Lesh told TechCrunch. “For decades, the promise of healthcare data has been undermined by necessary and strict privacy barriers, meaning patients have missed out on innovation that could — and should — save lives. Synthetic data has the power to change that.”

Lesh cares about saving lives. A medical doctor, he previously founded a company that developed a novel device to prevent stroke in patients with the heart condition atrial fibrillation. His hope is that Syntegra’s new datasets and its partnership with Tuva Health will now make it “much easier to use to develop solutions that provide real patient impact.”

Syntegra’s freely released datasets include synthetic versions of claims data and electronic health records (EHR). But the startup has more to offer, at a cost, to companies with larger and more custom needs.

“In addition to expanding populations, synthetic data can also be augmented and customized to fit the exact needs of the user, including addressing areas of bias (such as an uneven ratio of males to females) or filling in missing gaps in the data,” a company spokesperson wrote in an email.

Later this summer, Syntegra plans to launch an API to let users query healthcare data. By doing so, it will be joining a wave of startups that are developing health-focused APIs, such as Vivanta, a Mexico-based “Plaid for health.”

Whether it’s APIs or synthetic data for healthcare, the common thread is to empower more people to build health tech solutions. A better claims system? Less waiting time or bias? It is too early to tell, but I am curious to see what will come out of this. And of course, how patients will benefit.