How healthtech startups can achieve true value

Healthtech is apparently in a golden age. Just a few weeks ago, Livongo and Health Catalyst raised a combined $500 million through IPOs with a joint valuation reaching $3.5 billion. Deals such as these are catalyzing a record-breaking 2019, with digital health deal activity expected to surpass the $8.1 billion invested in 2018.

Amidst such abundance, the digital health ecosystem is thriving: as of 2017, greater than 300,000 mobile applications and 340 consumer wearable devices existed—with 200 new mobile applications added daily. No theme has been more important to this fundraising than artificial intelligence and machine learning (AI/ML), a space which captured more than one-quarter of healthtech funding in 2018.

Yet, how many of these technologies will prove valuable in medical, ethical, or financial terms?

Our research group at Stanford addressed this question by taking a deeper dive into the saying that, in AI/ML, “garbage in equals garbage out.” We did this by distinguishing digital health algorithms leveraging AI/ML from their underlying training data, documenting the numerous consequences to the outputs of these technologies should the inputs resemble, well, “garbage.”

For example, the utility of genetic risk scores provided by companies such as 23andMe and AncestryDNA (which have estimated valuations of $1.75 and $2.6 billion, respectively) may be limited due to diagnostic biases stemming from the underrepresentation of diverse populations.

Responding to such observations, we provide a variety of recommendations to the developers, inventors, and founders spearheading the advancement of digital health—as well as the funders supporting this charge forward—to ensure that their innovations are valuable to the stakeholders they target.

Healthtech startups still have to prove their value for patients

GettyImages 1154445332

Image via Getty Images / grivina

But first, a bit of context. Despite lofty deal activity and valuations, it is unclear that digital health has lived up to the hype thus far, since evidence of the benefits of health innovations is scarce.

Take the recent study by physician-researcher John Ioannidis and colleagues, which illustrated that the most valuable healthcare startups often publish little meaningful evidence supporting the benefits of their innovations. Within this cohort, digital health companies such as Clover Health, Oscar Health, and ZocDoc were among the most suspect, with five of six companies publishing one or fewer papers total.

There is also emerging evidence of potential harm wrought by these types of innovations. Researchers have discovered measurement errors and validation failures across many domains in digital health, including products made by prominent companies such as Omron, Garmin, MisFit and Withings.

Researchers have also documented major safety and reliability concerns. For instance, an application produced by Pfizer inaccurately measured blood glucose levels, leading the application to recommend administration of inappropriate doses of insulin — recommendations which can be life-threatening.

Additionally, when it comes to AI/ML technologies specifically, researchers have identified intrinsic racial and socioeconomic biases. Melanoma detection, for instance, often demonstrates markedly poorer performance in dark-skinned individuals when using AI/ML. Far from improving health outcomes for all patients, these ingrained biases could worsen health disparities if left uncorrected.

These collective failures are, as our group wrote in a briefing to our research article,

“…leading to speculation that areproducibility crisis could be engulfing the field. Such a reproducibility crisis would imply poormedical outcomes and, in turn,investment outcomes. In other words, it would imply the imminent bursting of dually scientific and financialbubbles.”

In sum, companies producing apps for melanoma detection (like SkinVision, which recently raised $7.5 million in a Series A) — or any other medical indication — will need to troubleshoot embedded dataset deficiencies should they hope to create sustainable value.

What makes digital health technologies ‘valuable’ to patients and practitioners?

GettyImages 943479036

Image via Getty Images / PeterSnow

Digital health still has tremendous potential to change the practice of medicine and improve the lives of many patients. If we view the value of digital health technologies consistent with the Institute of Healthcare Improvement’s (IHI) “Triple Aim,” then we can categorize value creation in this space according to 3 pillars: effectiveness, equity, and economics.

Effectiveness relates to the ability of health interventions to produce positive outcomes across real-world settings, meaning the theoretical benefits of these interventions must be able to endure the imperfect conditions of the real-world. Algorithms trained on single-source datasets generated under model circumstances that do not resemble real-life are unlikely to be so capable.

For example, considerable differences across step counts, intensity scores, and calculated metabolic rates have been noted between Fitbits. If algorithms are trained only based on ideal trial data without accounting for real-world fallibilities then they will not be valuable for living, breathing humans.

To avoid observation bias and overfitting, datasets used to train algorithms can be drawn from multiple settings under different conditions. Variability can also be simulated and introduced to improve universal predictive power.

Equity relates to the ability of health interventions to avoid producing negative outcomes in different patient populations. Effectiveness in one demographic group ought not to lead to harm in another—such as underdiagnosing melanoma in dark-skinned individuals. Algorithms trained on homogenous datasets will not provide valuable predictions for the non-default populations.

For example, a breast cancer detection algorithm produced by Paige.AI (who raised a $25 million Series A in February) has yielded positive results in a sample from Memorial Sloan Kettering (MSK). Yet, patient groups excluded from this training data (such as those with access to care at MSK) may not benefit from the technology if it fails to account for inter-population biological differences (as in, say, breast density differences between African-American and white populations).

In contrast, algorithms that leverage heterogeneity can avoid false positives and negatives that result from sampling bias. Dataset diversity—what is referred to as “representativeness”—is a prerequisite to real-world value across clinical settings.

Since underrepresented groups are less likely to be using the platforms where data are typically acquired (a phenomenon that has been deemed “digital redlining”), researchers and innovators must intentionally incorporate heterogeneity in trials, solicit and/or impute diversity to their datasets, as well as correct for it when it is lacking.

Economics refers to the ability of health interventions to be efficiently integrated into the practice of medicine. In order to be used by time-constrained providers and financially-constrained health systems, it’s critical for digital health technologies to account for (a) feasibility for introduction into clinical practice (i.e. the “first use” of new technologies) and (b) usability after being introduced into clinical practice (i.e. the “continued use” of new technologies).

Implementation data must also be gathered and assessed, beyond biomedical data alone. While Apple Watches can capture irregular cardiac rhythms that may have otherwise been missed, unless the protocols make sense in clinical practice, the technological capabilities will not add value. Without feasibility and usability, “death by 1,000 clicks” is all too common, no matter how positive the hypothetical outcomes.

Ensuring value creation in digital health 

Debugging algorithms alone isn’t enough to achieve the outcomes patients, inventors, and investors desire from digital health. As we conclude in the article, innovators in digital health must focus on having datasets capable of producing effectiveness, equity, and economic viability in order to “help realize the potential of big data for a personalized medicine era.” Such levels of rigor can help stakeholders rest assured that the golden age of healthtech is not, in fact, a gilded one.