A Popular COVID-19 Model Just Got Rosier. Don’t Read Too Much Into It

• April 7, 2020 4:10 pm


The designers of a widely cited model of U.S. COVID-19 deaths announced Monday that they expect the deadly virus to kill just 81,000 Americans, a substantial decline from previous projections.

The model, built by the Institute for Health Metrics and Evaluation at the University of Washington, offers an increasingly rosy picture of the pandemic, with its latest revision cutting projected deaths by over 10,000. The new figure is below even the White House's smallest projection and is much more optimistic than other models projecting deaths into the millions.

While the federal government has pointed to the IHME model as guiding its decision-making, many state and local executives remain unconvinced of these optimistic projections. Washington, D.C., mayor Muriel Bowser, for example, has opted to ignore the IHME model—which assumes that resource-use has already peaked—for a different model projecting peak infections in July.

Which projection is more accurate likely means the difference between life and death for thousands. That raises the question: Should lawmakers and Americans trust IHME's increasingly rosy assumptions?

The answer is best summarized by an old statistical adage: All models are wrong, but some are useful. The IHME model has certain advantages that less optimistic models do not have, but it has its own issues, which may explain its oddly optimistic output. Although a nation eager to return to work demands certainty, models based on uncertain data can't provide it—in fact, putting too much stock in any one of them is a certain recipe for disaster.

For epidemiologists—scientists who study epidemics—models are a key tool for understanding the spread and deadliness of diseases like COVID-19. One popular approach is to build something called a "SIR" (susceptible-infected-recovered) model. SIR models use simple facts about the disease to project at a given time how many people in a population are susceptible to, infected with, or recovered from a disease. Most models of the novel coronavirus, including the widely cited projection produced by the team at Imperial College London, are more complicated variations on this simple form.

IHME's model, however, works differently. Rather than modeling the dynamics of a population infected with the coronavirus, the IHME team took a more "top-down" approach. They looked at the rate of COVID-19 deaths per day and then asked what sort of line—in this case, an s-shaped curve called a "sigmoid"—best fit those data. Their assumption was that locales with rising deaths would follow the same trend as those that had already hit their "peak" rate of deaths.

Understanding this explains Monday's revision, which simply involved adding more data to refine curves. "As we obtain more data and more precise data, the forecasts we at IHME created have become more accurate," IHME director Dr. Christopher Murray said. It also explains why IHME is producing completely different projections from most other models—different method, different outputs. The obvious question is: Which is better?

The IHME team contends that its curve-fitting approach offers a number of advantages over traditional SIRs. Standard models, they argue, assume random mixing in the population and fail to take account of the effect of social distancing, shutdowns, and other policies meant to stop people from interacting. The IHME model necessarily incorporates the effects of these policies.

SIR models also generally track the number of cases, but our picture of how many cases of COVID-19 there truly are is limited by a lack of testing capacity. IHME's model, by contrast, uses death counts, which are thought to be more reliable. IHME further claims that its model is more practical for modeling actual hospital and administrative needs, while others are useful for "motivating action to prevent … worst-case scenarios."

Does all of this mean that the IHME model is a perfect representation of reality? Not quite: Critics argue its approach is limited by data quality and that it relies on conspicuously optimistic assumptions about social distancing.

Both of these problems stem from IHME's reliance on Chinese data. Because China is the only country to claim to have gone through a full wave of SARS-CoV-2 infection, the IHME model is strongly determined by the trend in deaths in China, and particularly in the city of Wuhan. There is strong evidence, however, that official Chinese figures massively undercount total deaths, with even residents of Wuhan calling the government's numbers woefully inaccurate. If those figures are inaccurate, then IHME's curve would also be inaccurate.

Monday's update, which incorporates more data from locales in Spain and Italy that have peaked, partially addresses this concern. But deaths are likely undercounted in those countries, as well as in the United States, as many individuals dying at home without being tested go uncounted. This means the data on which the IHME model is built—tracking the trend in deaths—may not match the reality of the disease's spread.

A bigger issue, however, is that the IHME model makes very strong assumptions about how effective and widespread social distancing will be. It assumes that just 3 percent of the population will end up infected by SARS-CoV-2, compared with ranges of 25 to 70 percent in other models. Such an outcome is only possible if social distancing measures are comprehensive and remain in place through the end of May.

Murray has acknowledged this issue, noting on Monday that "our forecasts assume that social distancing remains in place until the end of May." Absent extended and extremely effective social distancing, Murray said, "The U.S. will see greater death tolls, the death peak will be later, the burden on hospitals will be much greater, and the economic costs will continue to grow."

These assumptions explain why the D.C. government is not optimistic about IHME's projection that the peak has already passed there. The model preferred by Bowser's office—the University of Pennsylvania's CHIME—is much less certain about public compliance with social distancing requirements. The result is a higher death count and a longer epidemic.

This does not mean that the CHIME model is more likely to be right. In fact, the biggest takeaway from the COVID modeling dispute is that the conclusions of models are strongly dependent on both their assumptions and the quality of data that go into them. With questionable information on the novel coronavirus and no expert consensus about assumptions, it makes little sense for policymakers or the public to put too much stock in any one model or to read a lot into day-to-day changes. After all—all models are wrong, and just some turn out to be useful.

Published under: Coronavirus