It’s time to build against pandemics

We’re a few years out from the call to action Bill Gates made in his TED Talk on preparing for pandemics back in 2015, yet the state of scalable software for important workflows like data collection and contact-tracing has greatly lagged expectations during the current pandemic.

The Trump administration’s letter to health agencies regarding data-sharing guidelines asked for daily Excel uploads, and manual contact-tracing efforts without software have proven difficult given the scale of the current pandemic. 

Everything is being built right now. 

Research universities are helping build models used by the CDC for case prediction, and that’s brought to light the dire issues around incomplete data sharing between health institutions and governments. 

Dozens of contact-tracing apps are springing up, surfacing design decisions around privacy, the need for newer technologies beyond Bluetooth for near-field communication and leading companies like Google and Apple to strike partnerships to power cross-platform mobile capabilities.

The good news is that the current efforts are taking seriously the need for better software and driving necessary innovation to help society better prepare for pandemics.

How can detailed case data be shared by hospitals with governments to better predict case and mortality numbers, and used to better allocate medical and labor resources? 

How can software help local and state governments make better policies, and help digitize contact tracing while appeasing privacy concerns?

Software has the ability to power many of these capabilities, and it is creating new opportunities for startups to vet the newly formed appetite for better data and digitized workflows on the part of health agencies, local and state governments and other organizations involved in fighting pandemics.

Better data collection and algorithms for case prediction

The models used by the CDC and government organizations in the U.S. to predict the future number of cases and deaths are relying on fairly rudimentary data today. 

The data in most cases only contains case and death numbers by region. Further, this data is being provided by the CDC only at the state and country levels, and researchers are using data from Johns Hopkins as the gold standard to obtain additional county-level data. 

This data may look sufficient at first glance, but in practice, accurate modeling requires much more granular data for models to work effectively.

Each reported case should ideally include information on how long the patient has already been infected, the severity of their symptoms and much more, in order for models to be able to incorporate a patient’s level and time period of virality. 

Models should also be able to factor in variables such as a person’s age, population density and mobility in a given area, inflow and outflow of travelers in a region and policies around quarantine restrictions in an area.

The current models are, however, quite basic, and even general parameters such as the level of mobility in a particular region are taken into account only by some of them.

It’s no surprise that the variance of the predictions made by the models, most of which are built by teams at various research universities, has been quite high, with only a few consistently outperforming simple baseline benchmarks.

The chart below shows the ranked performance of the best-performing models, based on predictions of cumulative deaths for June 27 as a national aggregate of state-by-state predictions, made by each model during preceding weeks. 

Each of the columns shows the average state-by-state error based on the prediction it made at a given week. The last column, for example, shows the prediction made by a model on June 22 for June 27, and the first column shows the prediction made by a model on May 18 for June 27. 

Image Credits: COVID-19 Projections

Overall, only four models consistently perform better than the baseline model, which uses simple math to come up with predictions.

Interestingly, the YYG model, which is one of the best-performing models approved by the CDC for use, has been built single-handedly by a data scientist named Mr. Youyang Gu.

Mr. Gu’s model uses only death data and a simple grid search algorithm, but has been able to consistently perform in the top-four performing models amidst a sea of models built by much larger teams at research universities, which illustrates the lack of robustness in models at large.

For most of the models, the errors increase steadily from a week before to more than a month before the date for which the prediction is being made. For predictions made a few months out, the error rates are much higher, making them difficult to rely on for policy making. 

In order for models to perform better, the data collection process between the various entities needs to be improved in order to allow complex models that factor in more data attributes to be built. 

The current software stack for data collection is pretty much nonexistent. When Vice President Mike Pence addressed a letter on March 29 calling upon hospital administrators for data, he asked them to share data daily in an Excel spreadsheet, with a list of data elements attached.

There’s a clear need to unify the data collection process using software across different institutions, from hospitals to commercial labs to local governments, in a much more efficient manner, in place of manual efforts involving Excel. 

Software can help make this process real-time, enable retroactive correction of data as new data is presented, provide accessibility features for various entities involved and help collect much more granular metadata for better modeling. 

There’s further a need for software to help hospitals and local governments take such data and create real-time assessment and modeling to inform policy. Palantir is leveraging its products to help governments and health agencies, and has signed a number of contracts, including with the VA. Beyond tools such as Palantir’s Gotham, which help tie different data sets together, there’s also a need for platforms that can model data using infectious disease algorithms and help account for factors such as business openings and population density.

Though governments have traditionally been slow to adopt new software, some are now showing willingness to adopt technology, such as the County of Santa Clara, which has an online dashboard that shows cases by ZIP code and city within the county.

If software tools enabling better data collection and modeling are built and used successfully by a set of early adopters, it’s likely that other health and government institutions will follow, given the high-value impact software can have.

Building apps for case prevention and tracing

Beyond tools for modeling spread, there’s a need for software to help with contact-tracing efforts.

Contact tracing using manual methods is effective, as was seen during the 2014 Ebola outbreak, but it is difficult to scale during a pandemic. It further requires many steps, as evident in the contact-tracing workflow published by the CDC, and is prone to incomplete data points around transmission. 

East Asian countries have been some of the first to adopt widespread use of contact-tracing apps, despite existing privacy concerns. Contact-tracing apps typically use Bluetooth technology to determine if two phones have been in close proximity, and uses interaction data from an infected person to notify with an anonymous status update the network of people who have been in close proximity to the infected person. 

China has been using an app powered by Alibaba’s AliPay system to track people’s exposure to infected individuals and give color-coded signals. South Korea has a popular app called Corona 100m with similar functionality, which has been downloaded more than 1 million times.

In the U.S., more than a dozen apps are being built, but most are either prototypes or being used by a small set of users right now, and there are a number of technical challenges they need to overcome, in addition to concerns over privacy.

The issue around Android and Apple phones not being able to communicate over Bluetooth is being resolved with a partnership between Google and Apple. 

A challenge that continues to persist is the inaccuracy Bluetooth carries in accurately measuring distance between two phones. Separately, given only some fraction of the U.S. population might ultimately adopt use of a contact-tracing app, there’s a need for innovation in network theory design and algorithms in order for apps to provide valuable insights to users, even if only a fraction of people in their networks also use the app. 

A team of researchers led by CMU professor and world-renowned mathematician Dr. Po-Shen Loh is tackling both of these challenges with their contact-tracing app NOVID.

NOVID addresses the issues with using Bluetooth by leveraging ultrasound technology to provide more accurate measurement of distance and is one of the only apps using this novel approach.

The app works by monitoring a user’s activity over a two-week period and observing which entities come in contact with the user in locations where the user goes consistently, such as an office or home. These entities are “in-network” for the user, and if any entity within this group self-reports a positive COVID-19 result, the user is notified that someone in their regular network has self-reported being positive.

The app preserves anonymity by registering only entities that are regularly in the same place as the user, thereby not tracking all interactions a user is having, and not using GPS or personal information. This serves to give the user a general understanding that someone in their regular routine may be infected, without having to track their interactions outside of those regular locations. 

NOVID is innovating on the network theory front by releasing a new feature this month that provides network detection up to 12 hops away, where a hop refers to an entity being in a location regularly frequented by a user, as shown in a screenshot of their new app build below. 

Image Credits: NOVID

The goal of this approach is that even if about 10% of a user’s network uses the app, they should see a trickle of cases many hops out, even if not in their immediate networks, and that can provide a proxy of its spread in their network branching out. 

The NOVID app, along with most other apps being deployed, is in early stages of adoption, but the innovative approaches being undertaken with respect to new design and technology are compelling. If one or more of these new apps prove to work effectively, local and state governments could benefit greatly in mitigating spread by encouraging their use and digitizing contact-tracing efforts.

The Holy Grail of software eating pandemics is still a far ways out, but entire ecosystems involved in fighting pandemics, from health agencies to governments, are engaged in a unique manner for the first time. 

Beyond the role software can play in helping enable better data sharing and contact tracing, there are many other problems in which software can play a critical role in resolving, such as managing medical supplies by need, creating policy beta-testing capabilities for government officials, triangulating effectiveness of supportive medicines and much more. 

These times present unique opportunities for entrepreneurs and organizations to build software that addresses many of the problems being faced right now and help prepare the country for fighting against future pandemics. As Marc Andreessen pointed out in his note in April, it’s time to build.