Early in the pandemic, a number of researchers, startups and institutions developed AI systems that they claimed could diagnose COVID-19 from the sound of a person’s cough. At the time, we ourselves were enthusiastic about the prospect of AI that could be yielded as a weapon against the virus; in one headline, we endorsed cough-scrutinizing AI as “promising.”
But a recent study (first reported on by The Register) suggests that some cough-analyzing algorithms are less accurate than we — and the public — were led to believe. It serves as a cautionary tale for machine learning tech in healthcare, whose flaws aren’t always immediately apparent.
Researchers from The Alan Turing Institute and Royal Statistical Society, commissioned by the U.K. Health Security Agency, conducted an independent review of audio-based AI tech as a COVID-19 screening tool. Together with members from the University of Oxford, King’s College London, Imperial College London and University College London, they found that even the most accurate cough-detecting model performed worse than a model based on user-reported systems and demographic data, such as age and gender.
“The implications are that the AI models used by many apps add little or no value over and above the predictive accuracy offered by user-reported symptoms,” the co-authors of the report told TechCrunch in an email interview.
For the study, the researchers examined data from more than 67,000 people recruited through the National Health Service’s Test and Trace and REACT-1 programs, which asked participants to send back nose and throat swab test results for COVID-19 along with recordings of them coughing, breathing and talking. Using the audio recordings and test results, the researchers trained an AI model, attempting to see whether coughs could serve as an accurate biomarker.
Ultimately, they found that they could not. The AI model’s diagnostic accuracy wasn’t much better than chance when controlling for confounders.
Partly to blame was recruitment bias in the Test and Trace system, which required participants to have at least one COVID-19 symptom in order to take part. But professor Chris Holmes, lead author of the study and program director for health and medical science at The Alan Turing Institute, says the findings show coughs are a poor predictor of COVID-19 in general.
“It’s disappointing that this technology doesn’t work for COVID-19,” he told TechCrunch in an emailed statement. “Finding new ways to quickly and easily diagnose viruses like COVID-19 is really important to stop its spread.”
The study is a blow to commercial efforts like Fujitsu’s Cough in a Box, an app funded by the U.K.’s Department of Health and Social Care to collect and analyze audio recordings of COVID-19 symptoms. And it puts some scientific claims in doubt. One paper co-authored by researchers at the Massachusetts Institute of Technology pegged the accuracy of a cough-analyzing COVID-19 algorithm at 98.5% — a percentage that in retrospect seems dubiously high.
That isn’t to suggest the Turing Institute study is the last word on cough detection where it concerns COVID-19. Holmes leaves open the possibility that the tech may work for other respiratory viruses in the future.
But it wouldn’t be the first time healthcare AI has overpromised and underdelivered.
In 2018, STAT reported that IBM’s Watson supercomputer spit out erroneous cancer treatment advice, the result of training on a small number of synthetic cases. In a more recent example, a 2021 audit of healthcare system provider Epic’s AI algorithm for identifying patients with sepsis was found to miss nearly 70% of cases.