# Inaccuracies, irresponsible coverage, and conflicts of interest in The New Yorker

If you haven’t already read the terrible New Yorker long covid article, I don’t recommend doing so. Here is the letter I sent to the editor. Feel free to reuse or modify as you like. The below are just a subset of the many issues with the article. If you are looking for a good overview of long covid and patient advocacy, please instead read Ed Yong’s Long-Haulers Are Fighting for their Future.

Dear New Yorker editors,

I was disturbed by the irresponsible description of a suicide (in violation of journalism guidelines), undisclosed conflicts of interest, and omission of relevant medical research and historical context in the article, “The Struggle to Define Long Covid,” by Dhruv Khullar.

## Irresponsible description of a suicide

The article contained a description of a patient’s suicide with sensationalistic details, including about the method used. This is a violation of widely accepted journalism standards on how to cover suicide. Over 100 studies worldwide have found that risk of suicide contagion is real and responsible reporting can help mitigate this. Please consider these resources developed in collaboration with a host of organizations, including the CDC: Recommendations – Reporting on Suicide

## Conflicts of Interest

Both Khullar and one of his sources, oncologist Vinay Prasad, have funding from Arnold Ventures, the private investment fund created by ex-Enron hedge fund billionaire John Arnold, for projects that help hospitals reduce costs by cutting care. At a minimum, your article should be updated to note this conflict of interest.

## Omission of relevant history

Khullar repeatedly discussed how patients distrust doctors and may accuse doctors of “gaslighting,” yet he never once mentioned why this would be the case: the long history of medical racism, sexism, and ableism, all of which are ongoing problems in medicine and well-documented in dozens of peer-reviewed studies. He also failed to mention the history of many other physiological conditions (such as multiple sclerosis) being dismissed as psychological until scientists caught up and found the physiological origins. His description of the AIDS activist group ACT UP didn’t acknowledge that AIDS research would not have progressed in the incredible way that it has if ACT UP had not “clashed with scientists.”

## Omission of relevant medical research

Khullar omitted the extensive medical research on how long covid patients, including those with mild or no symptoms, have been shown to suffer physiological issues including neurological damage, complement-mediated thrombotic microangiopathy, GI immune system damage, corneal nerve fibre loss, immunological dysfunction, increased risk of kidney outcomes, dysfunction in T cell memory generation, cognitive dysfunction and deficits possibly setting the stage for Alzheimer’s, and ovarian failure.

Khullar tries to portray his main subject (and by extension, long covid patients in general) as unscientific, focusing on statements that are not substantiated by academic research, rather than highlighting the many aspects of long covid that are already well researched and well supported (or the research on related post-viral illnesses, including POTS and ME/CFS), or even all the long covid patients who have published in peer-reviewed journals. For example, when his main subject talks about long covid impacting the brain, Khullar hyper focuses on explaining that there is no evidence of covid crossing the blood-brain barrier, ignoring the larger point (which is supported by numerous studies) that long covid causes changes to the brain, even if the exact mechanism is unknown. This is a tactic commonly used by those trying to make science appear murkier than it is, and a general audience will leave this article with harmful misunderstandings.

I hope that you will update Khullar’s article to address these inaccuracies, financial conflicts of interest, and irresponsible suicide coverage. In its current state, many readers of this article will walk away with misconceptions about long covid, causing them to underestimate its severity and how much has already been confirmed by research.

Sincerely,

Rachel Thomas, PhD

## To contact the New Yorker for yourself

Feel free to reuse or modify my post above if it is helpful to you. From the New Yorker website:

Please send letters to [email protected], and include your postal address and daytime phone number. Letters may be edited for length and clarity, and may be published in any medium. All letters become the property of The New Yorker.

## Medicine is Political

The myth that medicine is perfectly rational, objective, and apolitical (in contrast to irrational activist patients) pervaded Khullar’s article. I put together a twitter thread with links to other history, research, and resources to try to debunk this. Please read more here:

# Australia can, and must, get R under 1.0

Summary: By using better masks, monitoring and improving indoor air quality, and rolling out rapid tests, we could quickly halt the current outbreaks in the Australian states of New South Wales (NSW) and Victoria. If we fail to do so, and open up before 80% of all Australians are vaccinated, we may have tens of thousands of deaths, and hundreds of thousands of children with chronic illness which could last for years.

## We can get R under 1.0

Pandemics either grow exponentially, or disappear exponentially. They don’t just stay at some constant level. If the reproduction number R, which is how many people each infected person transmits to, is greater than 1.0 in a region, then the pandemic grows exponentially and becomes out of control (as we see in NSW now), or it is less than 1.0, in which case the virus dies out.

No Australian state or territory is currently using any of the three best “bang for your buck” public health interventions: better masks, better ventilation, or rapid testing. Any of these on their own (combined with the existing measures being used in Vic) would likely be enough to get R<1. The combination of them would probably kill off the outbreaks rapidly. At that point life can largely return to normal.

Stopping delta is not impossible. Other jurisdictions have done it, including Taiwan and China. New Zealand appears to be well on the way too. There’s no reason Australia can’t join them.

Scientists have found that using better masks is the single best way to decrease viral transmission in a close indoor setting. They showed that if all teachers and students wear masks with good fit and filtration, transmission is reduced by a factor of around 300 times. The CDC has found that two free and simple techniques to enhance the fit of surgical masks, “double masking” and “knot and tuck”, both decrease virus exposure by a factor of more than ten compared to wearing a cloth or surgical mask alone. For more information, see my article (with Zeynep Tufekci) in The Atlantic.

### Ventilation

We now know that covid is airborne. That means that we need clean air. A recent study has shown that the key to managing this is to monitor CO2 levels in indoor spaces. That’s because CO2 levels are a good proxy for how well air is being circulated. Without proper ventilation, CO2 levels go up, and if there are infected people around virus levels go up too.

CO2 monitors can be bought in bulk for around \$50. Standards should be communicated for what acceptable maximum levels of CO2 are for classrooms, workplaces, and public indoor spaces, and education provided on how to improve air quality. Where CO2 levels can not be controlled, air purifiers with HEPA filtration should be required.

Better ventilation can decrease the probability of infection by a factor of 5-10 compared to indoor spaces which do not have good airflow.

### Rapid tests

Rapid antigen lateral flow tests are cheap, and provide testing results within 15-30 minutes. They have very few false positives. A Brisbane-based company, Ellume, has an FDA approved rapid test, and is exporting it around the world. But we’re not using it here in Australia.

If every workplace and school required daily rapid tests, around 75% of cases in these locations would be identified. Positive cases would isolate until they have results from a follow-up PCR test. Using this approach, transmission in schools and workplaces would be slashed by nearly three quarters, bringing R well under 1.0.

In the UK every child was tested twice a week in the last school term. Recent research suggests that daily rapid tests could allow more students to stay at school.

### Hitting a vaccination target

The Grattan Institute found we need to vaccinate at least 80% of the total population (including children) this year, and continue the vaccination rollout to 90% throughout 2022. Clinical trials for the vaccine in kids are finishing this month. If we can quickly ramp up the roll-out to kids, and maintain the existing momentum of vaccinations in adults, we may be able to achieve the 80% goal by the end of the year.

It’s important to understand, however, that no single intervention (including vaccination) will control covid. Many countries with high vaccination rates today have high covid death rates, due to waning immunity and unvaccinated groups. The point of all of these interventions is to reduce R. When R is under 1 and cases are under control, restrictions are not needed; otherwise, they are needed.

## We must get R under 1.0

### Over 200,000 children will develop chronic illness

The Doherty Report predicts that over three hundred thousand children will get symptomatic covid, and over 1.4 million kids will be infected, in the next 6 months if restrictions are reduced when 70% of adults are vaccinated. This may be a significant under-estimate: a recent CDC study predicts that 75% of school-kids would get infected in three months in the absence of vaccines and masks.

New research has found that one in seven infected kids may go on to develop “long covid”, a debilitating illness which can impact patients for years. Based on this data, we are looking at two hundred thousand kids (or possibly far more) with chronic illness. The reality may be even worse than this, since that research uses PCR tests to find infected kids, but PCR testing strategies have been shown to fail to identify covid in kids about half the time. Furthermore, this study looked at the alpha variant. The delta variant appears to be about twice as severe.

It’s too early to say when, or if, these children will recover. Some viruses such as polio led to life-long conditions, which weren’t discovered until years later. Long covid has a lot of similarities to myalgic encephalomyelitis, which for many people is a completely debilitating life-long condition.

In regions which have opened up, such as Florida, schools were “drowning” in cases within one week of starting term. In the UK, lawsuits are now being filed based on the risks being placed on children.

Delta rips through unvaccinated populations. For instance, in England delta took hold during May 2021. English schools took a cautious approach, placing school children in “bubbles” which did not mix. After school children were required to go directly home and not mix with anyone else. Nonetheless, within three months, more kids were getting infected than had ever been before. Cases in July 2021 were around double the previous worst month of December 2020.

### The Doherty Model greatly underestimates risks

The Doherty Model, which is being used as a foundation for Australian reopening policy, has many modeling and reporting issues which result in the Doherty Report greatly underestimating risks. (These issues are generally a result of how the report was commissioned, rather than being mistakes made by those doing the modeling.)

The Doherty Model has to work with incomplete data, such as the very limited information we have about the behavior of the delta variant. The recommended practice in this kind of situation is to not make a single assumption about the premises in a model, but to instead model uncertainty, by including a range of possible values for each uncertain premise. The Doherty Model does not do this. Instead, “point estimates”, that is, a single guess for each premise, are used. And a single output is produced by the model for each scenario.

This is a critical deficiency. By failing to account for uncertainty in inputs, or uncertainty in future changes (such as new variants), the model also fails to account for uncertainty in outputs. What’s the probability that the hospitalizations are far more rapid than in their single modeled outcome, such that Australian ICUs are overloaded? We don’t know, because that work hasn’t been done.

The Doherty Model makes a critical error in how it handles the Delta variant: “we will assume that the severity of Delta strains approximates Alpha strains”. We now know that it is incorrect: latest estimates are that “People who are infected with the highly contagious Delta variant are twice as likely to be hospitalized as those who are infected with the Alpha variant”.

The model also fails to correctly estimate the efficacy of Test, Trace, Isolate, and Quarantine (TTIQ). It assumes that TTIQ will be “optimal” for “hundreds of daily cases”, and “partial” for thousands of cases. However, in NSW optimal TTIQ was no longer maintained after just 50 cases, and the majority of cases were no longer isolating after 100 daily cases.

The Doherty Model assumes that vaccines are equally distributed throughout the country. This is mentioned in the report, and has also been confirmed by talking directly with those doing the modeling. However, there are groups where that’s not true. For instance, indigenous communities are only around ⅛ vaccinated. In this group, if restrictions are removed, then R will return towards 5.0 (the reproduction number of delta without vaccines or restrictions). As a result, nearly the entire population will be infected within months.

The same thing will happen with kids. The Doherty model fails to model school mixing, but instead makes a simplifying assumption that children have some random chance of meeting random other children each day. In practice however, they have a 100% chance of mixing with exactly the same children every day, at school.

The Doherty Model misses the vast majority of cases. That’s because it entirely ignores all cases after 180 days (when most cases occur). Another model has estimated the full impact of covid without such a time limitation. It finds that there would be around 25,000 deaths in Australia in the absence of restrictions.

A major problem with the National Plan based on the Doherty Report is that it goes directly from vaccination rate to actions, and bakes in all the model assumptions. It can’t take into account unanticipated changes, such as more transmissible variants, or mass infections of hospital staff.

It would be far better to decide actions in terms of measurements that reflect changing current conditions — that is, R and remaining health-care Capacity. The Doherty Institute models could be reported as estimated R and Capacity at 70% and 80% vaccination rates of adults, which is 56% and 64% of the full population.

Reducing transmission restrictions when R>1 or there is insufficient remaining capacity would be madness regardless of the vaccination rate.

### “Live with covid” means mass hospitalizations and ongoing outbreaks

Based on current projections, the best case scenario in one month’s time there will be over 2000 people hospitalized with covid in NSW, with over 350 in ICU. This is going to be a big stretch on the state’s resources. The same will happen in other states that fail to control outbreaks prior to achieving at least 80% vaccination rates of all populations, including children and indigenous communities.

Even when most adults are vaccinated, covid doesn’t go away. Immunity wanes after a few months, and there will continue to be groups where fewer people have been vaccinated. We can estimate the longer term impact of covid by looking at other countries. In the UK, 75% of 16+ residents are vaccinated. There are currently 700 covid deaths and 250,000 cases per week in the UK. If our death rate is proportionate, that would mean 266 Australians dying per week even after we get to 75% vaccinated (along with thousands of long covid cases, with their huge economic and societal cost). By comparison, there were 9 weekly deaths from flu in Australia in 2019.

## Conclusion

We are now hearing political leaders in Victoria and NSW giving up on getting the outbreaks under control. But we haven’t yet deployed the three easiest high-impact public health interventions we have at our disposal: better masks, better ventilation, and rapid tests. Any one of these (along with the existing measures) would be likely to neutralize the outbreaks; their impacts combined will be a powerful weapon.

If we don’t do this, then covid will leave hundreds of thousands of Australian children with chronic illness, and kill thousands of Australians. This is entirely avoidable.

Acknowledgements: Thanks to Dr Rachel Thomas for many discussions about this topic and for draft review. Thanks also to the many Australian scientists with whom I consulted during development of this article.

# 11 Short Videos About AI Ethics

I made a playlist of 11 short videos (most are 6-13 mins long) on Ethics in Machine Learning. This is from my ethics lecture in Practical Deep Learning for Coders v4. I thought these short videos would be easier to watch, share, or skip around.

What are Ethics and Why do they Matter? Machine Learning Edition: Through 3 key case studies, I cover how people can be harmed by machine learning gone wrong, why we as machine learning practitioners should care, and what tech ethics are.

All machine learning systems need ways to identify & address mistakes. It is crucial that all machine learning systems are implemented with ways to correctly surface and correct mistakes, and to provide recourse to those harmed.

The Problem with Metrics, Feedback Loops, and Hypergrowth: Overreliance on metrics is a core problem both in the field of machine learning and in the tech industry more broadly. As Goodhart’s Law tells us, when a measure becomes the target, it ceases to be a good measure, yet the incentives of venture capital push companies in this direction. We see out-of-control feedback loops, widespread gaming of metrics, and people being harmed as a result.

Not all types of bias are fixed by diversifying your dataset. The idea of bias is often too general to be useful. There are several different types of bias, and different types require different interventions to try to address them. Through a series of cases studies, we will go deeper into some of the various causes of bias.

Humans are biased too, so why does machine learning bias matter? A common objection to concerns about bias in machine learning models is to point out that humans are really biased too. This is correct, yet machine learning bias differs from human bias in several key ways that we need to understand and which can heighten the impact.

What You Need to Know about Disinformation: With a particular focus on how machine learning advances can contribute to disinformation, this covers some of the fundamental things to understand.

Foundations of Ethics: We consider different lenses through which to evaluate ethics, and what sort of questions to ask.

Tech Ethics Practices to Implement at your Workplace: Practical tech ethics practices you can implement at your workplace.

How to Address the Machine Learning Diversity Crisis: Only 12% of machine learning researchers are women. Based on research studies, I outline some evidence-based steps to take towards addressing this diversity crisis.

Advanced Technology is not a Substitute for Good Policy: We will look at some examples of what incentives cause companies to change their behavior or not (e.g. being warned for years of your role in an escalating genocide vs. threat of a hefty fine), how many AI ethics concerns are actually about human rights, and case studies of what happened when regulation & safety standards came to other industries.

You can find the playlist of 11 short videos here. And here is a longer, full-length free fast.ai course on practical data ethics.

# Getting Specific about AI Risks (an AI Taxonomy)

The term “Artificial Intelligence” is a broad umbrella, referring to a variety of techniques applied to a range of tasks. This breadth can breed confusion. Success in using AI to identify tumors on lung x-rays, for instance, may offer no indication of whether AI can be used to accurately predict who will commit another crime or which employees will succeed, or whether these latter tasks are even appropriate candidates for the use of AI. Misleading marketing hype often clouds distinctions between different types of tasks and suggests that breakthroughs on narrow research problems are more broadly applicable than is the case. Furthermore, the nature of the risks posed by different categories of AI tasks varies, and it is crucial that we understand the distinctions.

One source of confusion is that in fiction and the popular imagination, AI has often referred to computers achieving human consciousness: a broad, general intelligence. People may picture a super-smart robot, knowledgeable on a range of topics, able to perform many tasks. In reality, the current advances happening in AI right now are narrow: a computer program that can do one task, or class of tasks, well. For example, a software program analyzes mammograms to identify likely breast cancer, or a completely different software program provides scores to essays written by students, although is fooled by gibberish using sophisticated words. These are separate programs, and fundamentally different from the depictions of human-like AI in science fiction movies and books.

It is understandable that the public may often assume that since companies and governments are implementing AI for high-stakes tasks like predictive policing, determining healthcare benefits, screening resumes, and analyzing video job interviews, it must be because of AI’s superior performance. However, the sad reality is that often AI is being implemented as a cost-cutting measure: computers are cheaper than employing humans, and this can cause leaders to overlook harms caused by the switch, including biases, errors, and a failure to vet accuracy claims.

In a talk entitled “How to recognize AI snake oil”, Professor Arvind Narayanan created a useful taxonomy of three types of tasks AI is commonly being applied to right now:

• Perception: facial recognition, reverse image search, speech to text, medical diagnosis from x-rays or CT scans
• Automating judgement: spam detection, automated essay grading, hate speech detection, content recommendation
• Predicting social outcomes: predicting job success, predicting criminal recidivism, predicting at-risk kids

The above 3 categories are not comprehensive of all uses of AI, and there are certainly innovations that span across them. However, this taxonomy is a useful heuristic for considering differences in accuracy and differences in the nature of the risks we face. For perception tasks, some of the biggest ethical concerns are related to how accurate AI can be (e.g. for the state to accurately surveil protesters has chilling implications for our civil rights), but in contrast, for predicting social outcomes, many of the products are total junk, which is harmful in a different way.

The first area, perception, which includes speech to text and image recognition, is the area where researchers are making truly impressive, rapid progress. However, even within this area, that doesn’t mean that the technology is always ready to use, or that there aren’t ethical concerns. For example, facial recognition often has much higher error rates on dark-skinned women, due to unrepresentative training sets. Even when accuracy is improved to remove this bias, the use of facial recognition by police to identify protesters (which has happened numerous times in the USA) is a grave threat to civil rights. Furthermore, how a computer algorithm performs in a controlled, academic setting can be very different from how it performs when deployed in the real world. For example, Google Health developed a computer program that identifies diabetic retinopathy with 90% accuracy when used on high-quality eye scans. However, when it was deployed in clinics in Thailand, many of the scans were taken in poor lighting conditions, and over 20% of all scans were rejected by the algorithm as low quality, creating great inconvenience for the many patients that had to take another day off of work to travel to a different clinic to be retested.

While improvements are being made in the area of category 2, automating judgement, the technology is still faulty and there are limits to what is possible here due to the fact that culture and language usage are always evolving. Widely used essay grading software rewards “nonsense essays with sophisticated vocabulary,” and is biased against African-American students, giving their essays lower grades than expert human graders do. The software is able to measure sentence length, vocabulary, and spelling, but is unable to recognize creativity or nuance. Content from LGBTQ YouTube creators was mislabeled as “sexually explicit” and demonetized, harming their livelihoods. As Ali Alkhatib wrote, “The algorithm is always behind the curve, executing today based on yesterday’s data… This case [of YouTube demonetizing LGBTQ creators] highlights a shortcoming with a commonly offered solution to these kinds of problems, that more training data would eliminate errors of this nature: culture always shifts.” This is a fundamental limitation of this category: language is always evolving, new slurs and forms of hate speech develop, just as new forms of creative expression do as well.

Narayanan labels the third category, of trying to predict social outcomes, as “fundamentally dubious.” AI can’t predict the future, and to label a person’s potential is deeply concerning. Often, these approaches are no more accurate than simple linear regression. Social scientists spent 15 years painstakingly gathering a rich longitudinal dataset on families containing 12,942 variables. When 160 teams created machine learning models to predict which children in the dataset would have adverse outcomes, the most accurate submission was only slightly better than a simple benchmark model using just 4 variables, and many of the submissions did worse than the simple benchmark. In the USA, there is a black box software program with 137 inputs used in the criminal justice system to predict who is likely to be re-arrested, yet it is no more accurate than a linear classifier on just 2 variables. Not only is it unclear that there have been meaningful AI advances in this category, but more importantly the underlying premise of such efforts raises crucial questions about whether we should be attempting to use algorithms to predict someone’s future potential at all. Together with Matt Salganik, Narayanan has further developed these ideas in a course on the Limits to Prediction (check out the course pre-read, which is fantastic).

Narayanan’s taxonomy is a helpful reminder that advances in one category don’t necessarily mean much for a different category, and he offers the crucial insight that different applications of AI create different fundamental risks. The overly general term artificial intelligence, misleading hype from companies pushing their products, and confusing media coverage often cloud distinctions between different types of tasks and suggest that breakthroughs on narrow problems are more broadly applicable than they are. Understanding the types of technology available, as well as the distinct risks they raise, is crucial to addressing and preventing harmful misuses.

Read Narayanan’s How to recognize AI snake oil slides and notes for more detail.

This post was originally published on the USF Center for Applied Data Ethics (CADE) blog.

# fastdownload: the magic behind one of the famous 4 lines of code

## Background

At fast.ai we focussed on making important technical topics more accessible. That means that the libraries we create do as much as possible for the user, without limiting what’s possible.

fastai is famous for needing just four lines of code to get world-class deep learning results with vision, text, tabular, or recommendation system data:

path = untar_data(URLs.PETS)
label_func, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)


There have been many pages written about most of these: the flexibility of the Data Block API, the power of cnn_learner, and the state of the art transfer learning provided by fine_tune.

But what about untar_data? This first line of code, although rarely discussed, is actually a critical part of the puzzle. Here’s what it does:

1. If required, download the URL to a special folder (by default, ~/.fastai/archive). If it was already downloaded earlier, skip this step
3. If required, extract the downloaded file to another special folder (by default, ~/.fastai/archive). If it was already extracted earlier, skip this step
4. Return a Path object pointing at the location of the extracted archive.

Thanks to this, users don’t have to worry about where their archives and data can be stored, whether they’ve downloaded a URL before or not, and whether their downloaded file is the correct version. fastai handles all this for the user, letting them spend more of their time on the actual modeling process.

Your user just calls a single method, FastDownload.get, passing the URL required, and the URL will be downloaded and extracted to the directories you choose. The path to the extracted file is returned. If that URL has already been downloaded, then the cached archive or contents will be used automatically. However, if that size or hash of the archive is different to what it should be, then the user will be informed, and a new version will be downloaded.

fastdownload will add a file download_checks.py to your Python module which contains file sizes and hashes for your archives. Because it’s a regular python file, it will be automatically included in your package if you upload it to pypi or a conda channel.
Here’s all you need to provide a function that works just like untar_data:
from fastdownload import FastDownload

You can modify the locations that files are downloaded to by creating a config file ~/.myapp/config.ini (if you don’t have one, it will be created for you). The values in this file can be absolute or relative paths (relative paths are resolved relative to the location of the ini file).