Summary: Statistical tests need to be paired with proper data and study design to yield valid results. A recent review paper on Long Covid in children provides a useful example of how researchers can get this wrong. We use causal diagrams to decompose the problem and illustrate where errors were made.
The paper in question does not actually say any of these things, but rather concludes that “the true incidence of this syndrome in children and adolescents remains uncertain.” However, the challenges of accurate science journalism are not the topic for our article today. Rather, we will describe a critical flaw in the statistical analysis in this review, as an exercise in better understanding how to interpret statistical tests.
A key contribution of the review is that it separates those studies that use a “control group” from those that do not. The authors suggest we should focus our attention on the studies with a control group, because “in the absence of a control group, it is impossible to distinguish symptoms of long COVID from symptoms attributable to the pandemic.” The National Academy of Sciences warns that “use of an inappropriate control group can make it impossible to draw meaningful conclusions from a study.” As we will see, this is, unfortunately, what happened in this review. But first, let’s do a brief recap of control groups and statistical tests.
Control groups and RCTs
When assessing the impact of an intervention, such as the use of a new drug, the gold standard is to use a Randomised Controlled Trial (RCT). In an RCT, a representative sample is selected, and randomly split into two groups, one of which receives the medical intervention (e.g. the drug), and one which doesn’t (normally that one gets a placebo). This can, when things go well, show clearly whether the drug made a difference. Generally, a “p value” is calculated, which is the probability that the effect seen in the data would be observed by chance if there was truly no difference between cases and controls (i.e. null hypothesis was true), along with a “confidence interval”, which is the range of outcomes that would be expected after considering random variation. If the p value is less than some number (often 0.05) the RCT is considered to be “statistically significant”. Without an RCT, it can be harder to distinguish whether two groups differ because of the intervention, or because of some other difference between the groups.
We can represent this analysis as a diagram like so:
This is an example of a (simplified and informal) causal diagram. The black arrows show the direct relationships we can measure or control – in this case, our selection of control group vs experimental group is used to decide who gets the drug, and we then measure the outcome (e.g. do symptoms improve) for each group based on our group selection. Because the selection was random (since this is an RCT), we can infer the dotted line: how much does taking the drug change the outcome? If the size of the control or experimental group is small, then it is possible that the difference in outcomes between the two groups is entirely due to random chance. To handle that, we pop the effect size and sample size into statistical software such as R and it will tell us the p value and confidence interval of the effect.
Because RCTs are the gold standard for assessing the impact of a medical intervention, they are used whenever possible. Nearly all drugs on the market have been through multiple RCTs, and most medical education includes some discussion of the use and interpretation of RCTs.
Control groups and observational studies
Sometimes, as discussed in The Planning of Observational Studies of Human Populations, “it is not feasible to use controlled experimentation”, but we want to investigate a causal relationship between variables, in which case we may decide to use an observational study. For instance, studying “the relationship between smoking and health”, risk factors for “injuries in motor accidents”, or “effects of new social programmes”. In cases like these, it isn’t possible to create a true “control group” like in an RCT, since we cannot generally randomly assign people, for instance, to a group that are told to start smoking.
Instead, we have to try to find two groups that are as similar as possible, but differ only in the variable under study – for instance, a group of smokers and a group of non-smokers that are of similar demographics, health, etc. This can be challenging. Indeed, the question “does smoking cause cancer” remained controversial for decades, despite many attempts at observational studies.
Researchers have noted that “results from observational studies can confuse the effect of interest with other variables’ effects, leading to an association that is not causal. It would be helpful for clinicians and researchers to be able to visualize the structure of biases in a clinical study”. They suggest using causal diagrams for this purpose, including to help avoid confounding bias in epidemiological studies. So, let’s give that a try now!
Structure of the Long Covid review
In How Common Is Long COVID in Children and Adolescents? the authors suggest we focus on studies of Long Covid prevalence that include a control group. The idea is that we take one group that has (or had) COVID, and one group that didn’t, and then see if they have Long Covid symptoms a few weeks or months later. Here’s what the causal diagram would look like:
Here we are trying to determine if COVID infection causes Long Covid symptoms. Since COVID infection is the basis of the Control group selection, and we can compare the Long Covid symptoms for each group, that would allow us to infer the answer to our question. The statistical tests reported in the review paper only apply if this structure is correct.
However, it’s not quite this simple. We don’t directly know who has a COVID infection, but instead we have to infer it using a test (e.g serology, PCR, or rapid). It is so easy nowadays to run a statistical test on a computer, it can be quite tempting to just use the software and report what it says, without being careful to check that the statistical assumptions implicitly being made are met by the data and design.
We might hope that we could modify our diagram like so:
In this case, we could still directly infer the dotted line (i.e “does COVID infection cause Long Covid symptoms?”), since there is just one unknown relationship, and all the arrows go in the same direction.
But unfortunately, this doesn’t work either. The link between test results and infection is not perfect. Some researchers, for instance, have estimated that PCR tests may miss half, or even 90% of infections. Part of the reason is that “thresholds for SARS-CoV-2 antibody assays have typically been determined using samples from symptomatic, often hospitalised, patients”. Others have found that 36% of infections do not seroconvert, and that children in particular may serorevert. It appears that false negative test results may be more common in children – tests are most sensitive when used for middle-aged men.
To make things even more complicated, research shows that “Long-COVID is associated with weak anti-SARS-CoV-2 antibody response.”
Putting this all together, here’s what our diagram now looks like, using red arrows here to indicate negative relationships:
This shows that test results are now just associated with COVID infection, but also with Age and Long Covid symptoms, and that the association between COVID infection and test result is not imperfect and not fully understood.
Because of this, we can’t now directly infer the relationship between COVID infection and Long Covid symptoms. We would first need to fully understand and account for the confounders and uncertainties. Simply reporting the results of a statistical test does not give meaningful information in this case.
In particular, we can see that the issues we have identified all bias the data in the same direction: they result in infected cases being incorrectly placed in the control group.
The review claims that “all studies to date have substantial limitations or do not show a difference between children who had been infected by SARS-CoV-2 and those who were not”. This claim appears to be made on the basis of p-values, which are shown for each control group study in the review. All but one study did actually find a statistically significant difference between the groups being compared (at p<0.05, which is the usual cut-off for such analyses).
Regardless of what the results actually show, p-values are not being used in an appropriate way here. The American Statistical Association (ASA) has released a “Statement on Statistical Significance and P-Values” with six principles underlying the proper use and interpretation of the p-value. In particular, note the following principles:
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
A p-value is lower when there is more data, or a stronger relationship in the data (and visa versa). A high p-value does not necessarily mean that there is not a relationship in the data – it may simply mean that not enough data has been collected.
Because a p-value “does not measure the size of an effect or the importance of a result”, they don’t actually tell us about the prevalence of Long Covid. The use of p-values in studying drug efficacy is very common, since we do often want to answer the question “does this drug help at all”? But to assess what the range of prevalence levels may be, we instead need to look at confidence intervals, which unfortunately are not shown at all in the review.
Furthermore, we should not look at p-values out of context, but instead need to also consider the likelihood of alternative hypotheses. The alternative hypothesis provided in the review is that the symptoms may be due to “lockdown measures, including school closures”.
One of the included control group studies stood out as an outlier, in which 10% of Swiss children with negative tests were found to have Long Covid symptoms, many times higher than other similar studies. Was this because of the confounding effects discussed in the previous section, or was it due to lockdowns and school closures? Switzerland did not have a full lockdown, and schools were only briefly closed, reopening nearly a year before the Long Covid symptom tests in the study. On the other hand, Switzerland may have had a very high number of cases. Wikipedia notes that “the Swiss government has had an official policy of not testing people with only mild symptoms”, and has still recorded nearly 900 thousand cases in a population of just 8 million people.
In a statistical design, an alternative hypothesis should not be considered the null hypothesis unless we are quite certain it represents the normal baseline behaviour. But assuming that the symptoms found in the control group are due to pandemic factors other than infection is itself a hypothesis that needs careful testing and does not seem to be fully supported by the data in the study. It is not an appropriate design to use this as the base case, as was done in the review.
Conclusion and next steps
The problem with control group definition, incorrect use of statistical tests, and statistical design problems does not change the key conclusion of the review: “the true incidence of this syndrome in children and adolescents remains uncertain.” So, how to we resolve this uncertainty?
The review has a number of suggestions for future research to improve our understanding or Long Covid prevalence in children. As we’ve seen in this article, we also need to more carefully consider and account for confounding bias. It is often possible, mathematically, to infer an association even in more complex causal relationships such as we see above. However, doing so requires a full and accurate understanding of all of the relationships in the causal structure.
Furthermore, a more complete and rigorous assessment of confounders needs to be completed. We’ve only scratched the surface in this article on one aspect: bias in the control group. Bias in the “Long Covid symptoms” node also needs to be considered. For instance: are all Long Covid symptoms being considered; is there under-reporting due to difficulties of child communication or understanding; is there under-reporting due to gender bias; are “on again / off again” variable symptoms being tracked correctly; and so forth.
Whatever the solution turns out to be, it seems that for a while at least, the prevalence of Long Covid in children will remain uncertain. How parents, doctors, and policy makers respond to this risk and uncertainty will be a critical issue for children around the world.
Many thanks to Hannah Davis, Dr Deepti Gurdasani, Dr Rachel Thomas, Dr Zoë Hyde, and Dr Nisreen Alwan MBE for invaluable help with research and review for this article.
As the evidence continues to mount of alarming long term physiological impacts of covid, and tens of millions are unable to return to work, we might expect leaders to take covid more seriously. Yet we are seeing concerted efforts to downplay the long-term health effects of covid using strategies straight out of the climate denial playbook, such as funding contrarian scientists, misleading petitions, social media bots, and disingenuous debate tactics that make the science seem murkier than it is. In many cases, these minimization efforts are being funded by the same billionaires and institutions that fund climate change denialism. Dealing with many millions of newly disabled people will be very expensive for governments, social service programs, private insurance companies, and others. Thus, many have a significant financial interest in distorting the science around long term effects of covid to minimize the perceived impact.
In topics ranging from covid-19 to HIV research to the long history of wrongly assuming women’s illnesses are psychosomatic, we have seen again and again that medicine, like all science, is political. This shows up in myriad ways, such as: who provides funding, who receives that funding, which questions get asked, how questions are framed, what data is recorded, what data is left out, what categories included, and whose suffering is counted.
Scientists often like to think of their work as perfectly objective, perfectly rational, free from any bias or influence. Yet by failing to acknowledge the reality that there is no “view from nowhere”, they miss their own blindspots and make themselves vulnerable to bad-faith attacks. As one climate scientist recounted of the last 3 decades, “We spent a long time thinking we were engaged in an argument about data and reason, but now we realize it’s a fight over money and power… They [climate change deniers] focused their lasers on the science and like cats we followed their pointer and their lead.”
The American Institute for Economic Research (AIER), a libertarian think tank funded by right wing billionaire Charles Koch which invests in fossil fuels, energy utilities, and tobacco, is best known for its research denying the climate crisis. In October 2020, a document called the Great Barrington Declaration (GBD) was developed at a private AIER retreat, calling for a “herd immunity” approach to covid, arguing against lockdowns, and suggesting that young, healthy people have little to worry about. The three scientists who authored the GBD have prestigious pedigrees and are politically well-connected, speaking to White House Officials and having found favor in the British government. One of them, Sunetra Gupta of Oxford, had released a wildly inaccurate paper in March 2020 claiming that up to 68% of the UK population had been exposed to covid, and that there were already significant levels of herd immunity to coronavirus in both the UK and Italy (again, this was in March 2020). Gupta received funding from billionaire conservative donors, Georg and Emily von Opel. Another one of the authors, Jay Bhattacharya of Stanford, co-authored a widely criticized pre-print in April 2020 that relied on a biased sampling method to “show” that 85 times more people in Santa Clara County California had already had covid compared to other estimates, and thus suggested that the fatality rate for covid was much lower than it truly is.
Half of the social media accounts advocating for herd immunity seem to be bots, characterized as engaging in abnormally high levels of retweets & low content diversity. An article in the BMJ recently advised that it is “critical for physicians, scientists, and public health officials to realize that they are not dealing with an orthodox scientific debate, but a well-funded sophisticated science denialist campaign based on ideological and corporate interests.”
This myth of perfect scientific objectivity positions modern medicine as completely distinct from a history where women were diagnosed with “hysteria” (roaming uterus) for a variety of symptoms, where Black men were denied syphilis treatment for decades as part of a “scientific study”, and multiple sclerosis was “called hysterical paralysis right up to the day they invented a CAT scan machine” and demyelination could be seen on brain scans.
However, there is not some sort of clean break where bias was eliminated and all unknowns were solved. Black patients, including children, still receive less pain medication than white patients for the same symptoms. Women are still more likely to have their physical symptoms dismissed as psychogenic. Nearly half of women with autoimmune disorders report being labeled as “chronic complainers” by their doctors in the 5 years (on average) they spend seeking a diagnosis. All this impacts what data is recorded in their charts, what symptoms are counted.
Medical data are not objective truths. Like all data, the context is critical. It can be missing, biased, and incorrect. It is filtered through the opinions of doctors. Even blood tests and imaging scans are filtered through the decisions of what tests to order, what types of scans to take, what accepted guidelines recommend, what technology currently exists. And the technology that exists depends on research and funding decisions stretching back decades, influenced by politics and cultural context.
One may hope that in 10 years we will have clearer diagnostic tests for some illnesses which remain contested now, just as the ability to identify multiple sclerosis improved with better imaging. In the meantime, we should listen to patients and trust in their ability to explain their own experiences, even if science can’t fully understand them yet.
Science does not just progress inevitably, independent of funding and politics and framing and biases. A self-fulfilling prophecy often occurs in which doctors:
label a new, poorly understood, multi-system disease as psychogenic,
use this as justification to not invest much funding into researching physiological origins,
and then point to the lack of evidence as a reason why the illness must be psychogenic.
This is largely the experience of ME/CFS patients over the last several decades. Myalgic encephalomyelitis (ME/CFS), involves dysfunction of the immune system, autonomic systems, and energy metabolism (including mitochondrial dysfunction, hypoacetylation, reduced oxygen uptake, and impaired oxygen delivery). ME/CFS is more debilitating than many chronic diseases, including chronic renal failure, lung cancer, stroke, and type-2 diabetes. It is estimated 25–29% of patients are homebound or bedbound. ME/CFS is often triggered by viral infections, so it is not surprising that we are seeing some overlap between ME/CFS and long covid. ME/CFS disproportionately impacts women, and a now discredited 1970 paper identified a major outbreak in 1958 amongst nurses at a British hospital as “epidemic hysteria”. This early narrative of ME/CFS as psychogenic has been difficult to shake. Even as evidence continues to accumulate of immune, metabolic, and autonomous system dysfunction, some doctors persist in believing that ME/CFS must be psychogenic. It has remained woefully underfunded: from 2013-2017, NIH funding was only at 7.3% relative commensurate to its disease burden. Note that the below graph is on a log scale: ME/CFS is at 7%, Depression and asthma are at 100% and diseases like cancer and HIV are closer to 1000%.
Portraying patients as unscientific and irrational is the other side of the same coin for the myth that medicine is perfectly rational. Patients that disagree with having symptoms they know are physiological dismissed as psychogenic, that reject treatments from flawed studies, or who distrust medical institutions based on their experiences of racism, sexism, and mis-diagnosis, are labeled as “militant” or “irrational”, and placed in the same category with conspiracy theorists and those peddling disinformation.
On an individual level, receiving a psychological misdiagnosis lengthens the time it will take to get the right diagnosis, since many doctors will stop looking for physiological explanations. A study of 12,000 rare disease patients covered by the BBC found that “while being misdiagnosed with the wrong physical disease doubled the time it took to get to the right diagnosis, getting a psychological misdiagnosis extended it even more – by 2.5 up to 14 times, depending on the disease.” This dynamic holds true at the disease level as well: once a disease is mis-labeled as psychogenic, many doctors will stop looking for physiological origins.
We are seeing increasing efforts to dismiss long covid as psychogenic in high profile platforms such as the WSJ and New Yorker. The New Yorker’s first feature article on long covid, published last month, neglected to interview any clinicians who treat long covid patients nor to cite the abundant research on how covid causes damage to many organ systems, yet interviewed several doctors in unrelated fields who claim long covid is psychogenic. In response to a patient’s assertion that covid impacts the brain, the author spent an entire paragraph detailing how there is currently no evidence that covid crosses the blood-brain barrier, but didn’t mention the research on covid patients finding cognitive dysfunction and deficits, PET scans similar to those seen in Alzheimer’s patients, neurological damage, and shrinking grey matter. This leaves a general audience with the mistaken impression that it is unproven whether covid impacts the brain, and is a familiar tactic from bad-faith science debates.
The New Yorker article set up a strict dichotomy between long covid patients and doctors, suggesting that patients harbor a “disregard for expertise”; are less “concerned about what is and isn’t supported by evidence”; and are overly “impatient.” In contrast, doctors appreciate the “careful study design, methodical data analysis, and the skeptical interpretation of results” that medicine requires. Of course, this is a false dichotomy: many patients are more knowledgeable about the latest research than their doctors, some patients are publishing in peer-reviewed journals, and there are many medical doctors that are also patients. And on the other hand, doctors are just as prone as the rest of us to biases, blind spots, and institutional errors.
In 1987, 40,000 Americans had already died of AIDS, yet the government and pharmaceutical companies were doing little to address this health crisis. AIDS was heavily stigmatized, federal spending was minimal, and pharmaceutical companies lacked urgency. The activists of ACT UP used a two pronged approach: creative and confrontational acts of protest, and informed scientific proposals. When the FDA refused to even discuss giving AIDS patients access to experimental drugs, ACT UP protested at their headquarters, blocking entrances and lying down in front of the building with tombstones saying “Killed by the FDA”. This opened up discussions, and ACT UP offered viable scientific proposals, such as switching from the current approach of conducting drug trials on a small group of people over a long time, and instead testing a large group of people over a short time, radically speeding up the pace at which progress occurred. ACT UP used similar tactics to protest the NIH and pharmaceutical companies, demanding research on how to treat the opportunistic infections that killed AIDS patients, not solely research for a cure. The huge progress that has happened in HIV/AIDS research and treatment would not have happened without the efforts of ACT UP.
Across the world, we are at a pivotal time in determining how societies and governments will deal with the masses of newly disabled people due to long covid. Narratives that take hold early often have disproportionate staying power. Will we inaccurately label long covid as psychogenic, primarily invest in psychiatric research that can’t address the well-documented physiological damage caused by covid, and financially abandon the patients who are now unable to work? Or will we take the chance to transform medicine to better recognize the lived experiences and knowledge of patients, to center patient partnerships in biomedical research for complex and multi-system diseases, and strengthen inadequate disability support and services to improve life for all people with disabilities? The decisions we collectively make now on these questions will have reverberations for decades to come.
If you haven’t already read the terrible New Yorker long covid article, I don’t recommend doing so. Here is the letter I sent to the editor. Feel free to reuse or modify as you like. The below are just a subset of the many issues with the article. If you are looking for a good overview of long covid and patient advocacy, please instead read Ed Yong’s Long-Haulers Are Fighting for their Future.
Dear New Yorker editors,
I was disturbed by the irresponsible description of a suicide (in violation of journalism guidelines), undisclosed conflicts of interest, and omission of relevant medical research and historical context in the article, “The Struggle to Define Long Covid,” by Dhruv Khullar.
Irresponsible description of a suicide
The article contained a description of a patient’s suicide with sensationalistic details, including about the method used. This is a violation of widely accepted journalism standards on how to cover suicide. Over 100 studies worldwide have found that risk of suicide contagion is real and responsible reporting can help mitigate this. Please consider these resources developed in collaboration with a host of organizations, including the CDC: Recommendations – Reporting on Suicide
Conflicts of Interest
Both Khullar and one of his sources, oncologist Vinay Prasad, have funding from Arnold Ventures, the private investment fund created by ex-Enron hedge fund billionaire John Arnold, for projects that help hospitals reduce costs by cutting care. At a minimum, your article should be updated to note this conflict of interest.
Omission of relevant history
Khullar repeatedly discussed how patients distrust doctors and may accuse doctors of “gaslighting,” yet he never once mentioned why this would be the case: the long history of medical racism, sexism, and ableism, all of which are ongoing problems in medicine and well-documented in dozens of peer-reviewed studies. He also failed to mention the history of many other physiological conditions (such as multiple sclerosis) being dismissed as psychological until scientists caught up and found the physiological origins. His description of the AIDS activist group ACT UP didn’t acknowledge that AIDS research would not have progressed in the incredible way that it has if ACT UP had not “clashed with scientists.”
Khullar tries to portray his main subject (and by extension, long covid patients in general) as unscientific, focusing on statements that are not substantiated by academic research, rather than highlighting the many aspects of long covid that are already well researched and well supported (or the research on related post-viral illnesses, including POTS and ME/CFS), or even all the long covid patients who have published in peer-reviewed journals. For example, when his main subject talks about long covid impacting the brain, Khullar hyper focuses on explaining that there is no evidence of covid crossing the blood-brain barrier, ignoring the larger point (which is supported by numerous studies) that long covid causes changes to the brain, even if the exact mechanism is unknown. This is a tactic commonly used by those trying to make science appear murkier than it is, and a general audience will leave this article with harmful misunderstandings.
I hope that you will update Khullar’s article to address these inaccuracies, financial conflicts of interest, and irresponsible suicide coverage. In its current state, many readers of this article will walk away with misconceptions about long covid, causing them to underestimate its severity and how much has already been confirmed by research.
Rachel Thomas, PhD
To contact the New Yorker for yourself
Feel free to reuse or modify my post above if it is helpful to you. From the New Yorker website:
Please send letters to [email protected], and include your postal address and daytime phone number. Letters may be edited for length and clarity, and may be published in any medium. All letters become the property of The New Yorker.
Medicine is Political
The myth that medicine is perfectly rational, objective, and apolitical (in contrast to irrational activist patients) pervaded Khullar’s article. I put together a twitter thread with links to other history, research, and resources to try to debunk this. Please read more here:
Medicine, like all of science, is political: - which questions get asked - which projects get funded - how debates get framed - who the researchers are - context of data (what categories, what labels, which biases, what is left out) - whose suffering is counted 1/
Summary: By using better masks, monitoring and improving indoor air quality, and rolling out rapid tests, we could quickly halt the current outbreaks in the Australian states of New South Wales (NSW) and Victoria. If we fail to do so, and open up before 80% of all Australians are vaccinated, we may have tens of thousands of deaths, and hundreds of thousands of children with chronic illness which could last for years.
Pandemics either grow exponentially, or disappear exponentially. They don’t just stay at some constant level. If the reproduction number R, which is how many people each infected person transmits to, is greater than 1.0 in a region, then the pandemic grows exponentially and becomes out of control (as we see in NSW now), or it is less than 1.0, in which case the virus dies out.
No Australian state or territory is currently using any of the three best “bang for your buck” public health interventions: better masks, better ventilation, or rapid testing. Any of these on their own (combined with the existing measures being used in Vic) would likely be enough to get R<1. The combination of them would probably kill off the outbreaks rapidly. At that point life can largely return to normal.
Stopping delta is not impossible. Other jurisdictions have done it, including Taiwan and China. New Zealand appears to be well on the way too. There’s no reason Australia can’t join them.
Scientists have found that using better masks is the single best way to decrease viral transmission in a close indoor setting. They showed that if all teachers and students wear masks with good fit and filtration, transmission is reduced by a factor of around 300 times. The CDC has found that two free and simple techniques to enhance the fit of surgical masks, “double masking” and “knot and tuck”, both decrease virus exposure by a factor of more than ten compared to wearing a cloth or surgical mask alone. For more information, see my article (with Zeynep Tufekci) in The Atlantic.
We now know that covid is airborne. That means that we need clean air. A recent study has shown that the key to managing this is to monitor CO2 levels in indoor spaces. That’s because CO2 levels are a good proxy for how well air is being circulated. Without proper ventilation, CO2 levels go up, and if there are infected people around virus levels go up too.
CO2 monitors can be bought in bulk for around $50. Standards should be communicated for what acceptable maximum levels of CO2 are for classrooms, workplaces, and public indoor spaces, and education provided on how to improve air quality. Where CO2 levels can not be controlled, air purifiers with HEPA filtration should be required.
Better ventilation can decrease the probability of infection by a factor of 5-10 compared to indoor spaces which do not have good airflow.
Rapid antigen lateral flow tests are cheap, and provide testing results within 15-30 minutes. They have very few false positives. A Brisbane-based company, Ellume, has an FDA approved rapid test, and is exporting it around the world. But we’re not using it here in Australia.
If every workplace and school required daily rapid tests, around 75% of cases in these locations would be identified. Positive cases would isolate until they have results from a follow-up PCR test. Using this approach, transmission in schools and workplaces would be slashed by nearly three quarters, bringing R well under 1.0.
In the UK every child was tested twice a week in the last school term. Recent research suggests that daily rapid tests could allow more students to stay at school.
Hitting a vaccination target
The Grattan Institute found we need to vaccinate at least 80% of the total population (including children) this year, and continue the vaccination rollout to 90% throughout 2022. Clinical trials for the vaccine in kids are finishing this month. If we can quickly ramp up the roll-out to kids, and maintain the existing momentum of vaccinations in adults, we may be able to achieve the 80% goal by the end of the year.
It’s important to understand, however, that no single intervention (including vaccination) will control covid. Many countries with high vaccination rates today have high covid death rates, due to waning immunity and unvaccinated groups. The point of all of these interventions is to reduce R. When R is under 1 and cases are under control, restrictions are not needed; otherwise, they are needed.
We must get R under 1.0
Over 200,000 children will develop chronic illness
The Doherty Report predicts that over three hundred thousand children will get symptomatic covid, and over 1.4 million kids will be infected, in the next 6 months if restrictions are reduced when 70% of adults are vaccinated. This may be a significant under-estimate: a recent CDC study predicts that 75% of school-kids would get infected in three months in the absence of vaccines and masks.
New research has found that one in seven infected kids may go on to develop “long covid”, a debilitating illness which can impact patients for years. Based on this data, we are looking at two hundred thousand kids (or possibly far more) with chronic illness. The reality may be even worse than this, since that research uses PCR tests to find infected kids, but PCR testing strategies have been shown to fail to identify covid in kids about half the time. Furthermore, this study looked at the alpha variant. The delta variant appears to be about twice as severe.
It’s too early to say when, or if, these children will recover. Some viruses such as polio led to life-long conditions, which weren’t discovered until years later. Long covid has a lot of similarities to myalgic encephalomyelitis, which for many people is a completely debilitating life-long condition.
In regions which have opened up, such as Florida, schools were “drowning” in cases within one week of starting term. In the UK, lawsuits are now being filed based on the risks being placed on children.
Delta rips through unvaccinated populations. For instance, in England delta took hold during May 2021. English schools took a cautious approach, placing school children in “bubbles” which did not mix. After school children were required to go directly home and not mix with anyone else. Nonetheless, within three months, more kids were getting infected than had ever been before. Cases in July 2021 were around double the previous worst month of December 2020.
The Doherty Model greatly underestimates risks
The Doherty Model, which is being used as a foundation for Australian reopening policy, has many modeling and reporting issues which result in the Doherty Report greatly underestimating risks. (These issues are generally a result of how the report was commissioned, rather than being mistakes made by those doing the modeling.)
The Doherty Model has to work with incomplete data, such as the very limited information we have about the behavior of the delta variant. The recommended practice in this kind of situation is to not make a single assumption about the premises in a model, but to instead model uncertainty, by including a range of possible values for each uncertain premise. The Doherty Model does not do this. Instead, “point estimates”, that is, a single guess for each premise, are used. And a single output is produced by the model for each scenario.
This is a critical deficiency. By failing to account for uncertainty in inputs, or uncertainty in future changes (such as new variants), the model also fails to account for uncertainty in outputs. What’s the probability that the hospitalizations are far more rapid than in their single modeled outcome, such that Australian ICUs are overloaded? We don’t know, because that work hasn’t been done.
The Doherty Model makes a critical error in how it handles the Delta variant: “we will assume that the severity of Delta strains approximates Alpha strains”. We now know that it is incorrect: latest estimates are that “People who are infected with the highly contagious Delta variant are twice as likely to be hospitalized as those who are infected with the Alpha variant”.
The model also fails to correctly estimate the efficacy of Test, Trace, Isolate, and Quarantine (TTIQ). It assumes that TTIQ will be “optimal” for “hundreds of daily cases”, and “partial” for thousands of cases. However, in NSW optimal TTIQ was no longer maintained after just 50 cases, and the majority of cases were no longer isolating after 100 daily cases.
The Doherty Model assumes that vaccines are equally distributed throughout the country. This is mentioned in the report, and has also been confirmed by talking directly with those doing the modeling. However, there are groups where that’s not true. For instance, indigenous communities are only around ⅛ vaccinated. In this group, if restrictions are removed, then R will return towards 5.0 (the reproduction number of delta without vaccines or restrictions). As a result, nearly the entire population will be infected within months.
The same thing will happen with kids. The Doherty model fails to model school mixing, but instead makes a simplifying assumption that children have some random chance of meeting random other children each day. In practice however, they have a 100% chance of mixing with exactly the same children every day, at school.
The Doherty Model misses the vast majority of cases. That’s because it entirely ignores all cases after 180 days (when most cases occur). Another model has estimated the full impact of covid without such a time limitation. It finds that there would be around 25,000 deaths in Australia in the absence of restrictions.
A major problem with the National Plan based on the Doherty Report is that it goes directly from vaccination rate to actions, and bakes in all the model assumptions. It can’t take into account unanticipated changes, such as more transmissible variants, or mass infections of hospital staff.
It would be far better to decide actions in terms of measurements that reflect changing current conditions — that is, R and remaining health-care Capacity. The Doherty Institute models could be reported as estimated R and Capacity at 70% and 80% vaccination rates of adults, which is 56% and 64% of the full population.
Reducing transmission restrictions when R>1 or there is insufficient remaining capacity would be madness regardless of the vaccination rate.
“Live with covid” means mass hospitalizations and ongoing outbreaks
Based on current projections, the best case scenario in one month’s time there will be over 2000 people hospitalized with covid in NSW, with over 350 in ICU. This is going to be a big stretch on the state’s resources. The same will happen in other states that fail to control outbreaks prior to achieving at least 80% vaccination rates of all populations, including children and indigenous communities.
Even when most adults are vaccinated, covid doesn’t go away. Immunity wanes after a few months, and there will continue to be groups where fewer people have been vaccinated. We can estimate the longer term impact of covid by looking at other countries. In the UK, 75% of 16+ residents are vaccinated. There are currently 700 covid deaths and 250,000 cases per week in the UK. If our death rate is proportionate, that would mean 266 Australians dying per week even after we get to 75% vaccinated (along with thousands of long covid cases, with their huge economic and societal cost). By comparison, there were 9 weekly deaths from flu in Australia in 2019.
We are now hearing political leaders in Victoria and NSW giving up on getting the outbreaks under control. But we haven’t yet deployed the three easiest high-impact public health interventions we have at our disposal: better masks, better ventilation, and rapid tests. Any one of these (along with the existing measures) would be likely to neutralize the outbreaks; their impacts combined will be a powerful weapon.
If we don’t do this, then covid will leave hundreds of thousands of Australian children with chronic illness, and kill thousands of Australians. This is entirely avoidable.
Acknowledgements: Thanks to Dr Rachel Thomas for many discussions about this topic and for draft review. Thanks also to the many Australian scientists with whom I consulted during development of this article.
The Problem with Metrics, Feedback Loops, and Hypergrowth: Overreliance on metrics is a core problem both in the field of machine learning and in the tech industry more broadly. As Goodhart’s Law tells us, when a measure becomes the target, it ceases to be a good measure, yet the incentives of venture capital push companies in this direction. We see out-of-control feedback loops, widespread gaming of metrics, and people being harmed as a result.
Not all types of bias are fixed by diversifying your dataset. The idea of bias is often too general to be useful. There are several different types of bias, and different types require different interventions to try to address them. Through a series of cases studies, we will go deeper into some of the various causes of bias.
Humans are biased too, so why does machine learning bias matter? A common objection to concerns about bias in machine learning models is to point out that humans are really biased too. This is correct, yet machine learning bias differs from human bias in several key ways that we need to understand and which can heighten the impact.
Advanced Technology is not a Substitute for Good Policy: We will look at some examples of what incentives cause companies to change their behavior or not (e.g. being warned for years of your role in an escalating genocide vs. threat of a hefty fine), how many AI ethics concerns are actually about human rights, and case studies of what happened when regulation & safety standards came to other industries.