# Some thoughts on zero-day threats in AI, and OpenAI's GPT-2

There’s been a lot of discussion in the last couple of days about OpenAI’s new language model. OpenAI made the unusual decision to not release their trained model (the AI community is usually extremely open about sharing them). On the whole, the reaction has been one of both amazement and concern, and has been widely discussed in the media, such as this thoughtful and thorough coverage in The Verge. The reaction from the academic NLP community, on the other hand, has been largely (but not exclusively) negative, claiming that:

1. This shouldn’t be covered in the media, because it’s nothing special
2. OpenAI had no reason to keep the model to themselves, other than to try to generate media hype through claiming their model is so special it has to be kept secret.

In addition, the history of technology has repeatedly shown that the hard thing is not, generally, solving a specific engineering problem, but showing that a problem can be solved. So showing what is possible is, perhaps, the most important step in technology development. I’ve been warning about potential misuse of pre-trained language models for a while, and even helped develop some of the approaches the people are using now to build this tech; but it’s not until OpenAI actually showed what can be done in practice that the broader community has woken up to some of the concerns.

But what about the second issue: should OpenAI release their pretrained model? This one seems much more complex. We’ve already heard from the “anti-model-release” view, since that’s what OpenAI has published and also discussed with the media. Catherine Olsson (who previously worked at OpenAI) asked on Twitter if anyone has yet seen a compelling explanation of the alternative view:

I’ve read a lot of the takes on this, and haven’t yet found one that really qualifies. A good-faith explanation would need to engage with what OpenAI’s researchers actually said, which takes a lot of work, since their team have written a lot of research on the societal implications of AI (both at OpenAI, and elsewhere). The most in-depth analysis of this topic is the paper The Malicious Use of Artificial Intelligence. The lead author of this paper now works at OpenAI, and was heavily involved in the decision around the model release. Let’s take a look at the recommendations of that paper:

1. Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI
2. Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuserelated considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.
3. Best practices should be identified in research areas with more mature methods for addressing dual-use concerns, such as computer security, and imported where applicable to the case of AI.
4. Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges.

An important point here is that an appropriate analysis of potential malicious use of AI requires a cross-functional team and deep understanding of history in related fields. I agree. So what follows is just my one little input to this discussion. I’m not ready to claim that I have the answer to the question “should OpenAI have released the model”. I will also try to focus on the “pro-release” side, since that’s the piece that hasn’t had much thoughtful input yet.

## A case for releasing the model

OpenAI said that their release strategy is:

Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code.

So specifically we need to be discussing scale. Their claim is that a larger scale model may cause significant harm without time for the broader community to consider it. Interestingly, even they don’t claim to be confident of this concern:

This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas.

Let’s get specific. How much scale are we actually talking about? I don’t see this explicitly mentioned in their paper of blog post, but we can make a reasonable guess. The new GPT2 model has (according to the paper) about ten times as many parameters as their previous GPT model. Their previous model took 8 GPUs 1 month to train. One would expect that they can train their model faster by now, since they’ve had plenty of time to improve their algorithms, but on the other hand, their new model probably takes more epochs to train. Let’s assume that these two balance out, so we’re left with the difference of 10x in parameters.

If you’re in a hurry and you want to get this done in a month, then you’re going to need 80 GPUs. You can grab a server with 8 GPUs from the AWS spot market for $7.34/hour. That’s around$5300 for a month. You’ll need ten of these servers, so that’s around $50k to train the model in a month. OpenAI have made their code available, and described how to create the necessary dataset, but in practice there’s still going to be plenty of trial and error, so in practice it might cost twice as much. If you’re in less of a hurry, you could just buy 8 GPUs. With some careful memory handling (e.g. using Gradient checkpointing) you might be able to get away with buying RTX 2070 cards at$500 each, otherwise you’ll be wanting the RTX 2080 ti at $1300 each. So for 8 cards, that’s somewhere between$4k and $10k for the GPUs, plus probably another$10k or so for a box to put them in (with CPUs, HDDs, etc). So that’s around $20k to train the model in 10 months (again, you’ll need some extra time and money for the data collection, and some trial and error). Most organizations doing AI already have 8 or more GPUs available, and can often get access to far more (e.g. AWS provides up to$100k credits to startups in its AWS Activate program, and Google provides dozens of TPUs to any research organization that qualifies for their research program).

So in practice, the decision not to release the model has a couple of outcomes:

1. It’ll probably take at least a couple of months before another organization has successfully replicated it, so we have some breathing room to discuss what to do when this is more widely available
2. Small organizations that can’t afford to spend $100k or so are not able to use this technology at the scale being demonstrated. Point (1) seems like a good thing. If suddenly this tech is thrown out there for anyone to use without any warning, then no-one can be prepared at all. (In theory, people could have been prepared because those within the language modeling community have been warning of such a potential issue, but in practice people don’t tend to take it seriously until they can actually see it happening.) This is what happens, for instance, in the computer security community, where if you find a flaw the expectation is that you help the community prepare for it, and only then do you release full details (and perhaps an exploit). When this doesn’t happen, it’s called a zero day attack or exploit, and it can cause enormous damage. I’m not sure I want to promote a norm that zero-day threats are OK in AI. On the other hand, point (2) is a problem. The most serious threats are most likely to come from folks with resources to spend$100k or so on (for example) a disinformation campaign to attempt to change the outcome of a democratic election. In practice, the most likely exploit is (in my opinion) a foreign power spending that money to dramatically escalate existing disinformation campaigns, such as those that have been extensively documented by the US intelligence community.

The only practical defense against such an attack is (as far as I can tell) to use the same tools to both attempt to identify, and push back against, such disinformation. These kinds of defenses are likely to be much more powerful when wielded by the broader community of those impacted. The power of a large group of individuals has repeatedly been shown to be more powerful at creating, than at destruction, as we see in projects such as Wikipedia, or open source software.

In addition, if these tools aren’t in the hands of people without access to large compute resources, then they remain abstract and mysterious. What can they actually do? What are their constraints? For people to make informed decisions, they need to have a real understanding of these issues.

## Conclusion

So, should OpenAI release their trained model? Frankly, I don’t know. There’s no question in my mind that they’ve demonstrated something fundamentally qualitatively different to what’s been demonstrated before (despite not showing any significant algorithmic or theoretic breakthroughs). And I’m sure it will be used maliciously; it will be a powerful tool for disinformation and for influencing discourse at massive scale, and probably only costs about \$100k to create.

By releasing the model, this malicious use will happen sooner. But by not releasing the model, there will be fewer defenses available and less real understanding of the issues from those that are impacted. Those both sound like bad outcomes to me.