AI Safety and the Age of Dislightenment

Abstract

Proposals for stringent AI model licensing and surveillance will likely be ineffective or counterproductive, concentrating power in unsustainable ways, and potentially rolling back the societal gains of the Enlightenment. The balance between defending society and empowering society to defend itself is delicate. We should advocate for openness, humility and broad consultation to develop better responses aligned with our principles and values — responses that can evolve as we learn more about this technology with the potential to transform society for good or ill.

Executive summary

Artificial Intelligence is moving fast, and we don’t know what might turn out to be possible. OpenAI CEO Sam Altman thinks AI might “capture the light cone of all future value in the universe”. But things might go wrong, with some experts warning of “the risk of extinction from AI”.

This had led many to propose an approach to regulating AI, including the whitepaper “Frontier AI Regulation: Managing Emerging Risks to Public Safety” (which we’ll refer to as “FAR”), and in the Parliament version of the EU AI Act, that goes as follows:

Create standards for development and deployment of AI models, and
Create mechanisms to ensure compliance with these standards.

Other experts, however, counter that “There is so much attention flooded onto x-risk (existential risk)… that it ‘takes the air out of more pressing issues’ and insidiously puts social pressure on researchers focused on other current risks.”

Important as current risks are, does the threat of human extinction mean we should go ahead with this kind of regulation anyway?

Perhaps not. As we’ll see, if AI turns out to be powerful enough to be a catastrophic threat, the proposal may not actually help. In fact it could make things much worse, by creating a power imbalance so severe that it leads to the destruction of society. These concerns apply to all regulations that try to ensure the models themselves (“development”) are safe, rather than just how they’re used. The effects of these regulations may turn out to be impossible to undo, and therefore we should be extremely careful before we legislate them.

The kinds of model development that FAR and the AI Act aim to regulate are “foundation models” — general-purpose AI which can handle (to varying degrees of success) nearly any problem you throw at them. There is no way to ensure that any general-purpose device (like, say, a computer, or a pen) can’t ever be used to cause harm. Therefore, the only way to ensure that AI models can’t be misused is to ensure that no one can use them directly. Instead, they must be limited to a tightly controlled narrow service interface (like ChatGPT, an interface to GPT-4).

But those with full access to AI models (such as those inside the companies that host the service) have enormous advantages over those limited to “safe” interfaces. If AI becomes extremely powerful, then full access to models will be critical to those who need to remain competitive, as well as to those who wish to cause harm. They can simply train their own models from scratch, or exfiltrate existing ones through blackmail, bribery, or theft. This could lead to a society where only groups with the massive resources to train foundation models, or the moral disregard to steal them, have access to humanity’s most powerful technology. These groups could become more powerful than any state. Historically, large power differentials have led to violence and subservience of whole societies.

If we regulate now in a way that increases centralisation of power in the name of “safety”, we risk rolling back the gains made from the Age of Enlightenment, and instead entering a new age: the Age of Dislightenment. Instead, we could maintain the Enlightenment ideas of openness and trust, such as by supporting open-source model development. Open source has enabled huge technological progress through broad participation and sharing. Perhaps open AI models could do the same. Broad participation could allow more people with a wider variety of expertise to help identify and counter threats, thus increasing overall safety — as we’ve previously seen in fields like cyber-security.

There are interventions we can make now, including the regulation of “high-risk applications” proposed in the EU AI Act. By regulating applications we focus on real harms and can make those most responsible directly liable. Another useful approach in the AI Act is to regulate disclosure, to ensure that those using models have the information they need to use them appropriately.

AI impacts are complex, and as such there is unlikely to be any one panacea. We will not truly understand the impacts of advanced AI until we create it. Therefore we should not be in a rush to regulate this technology, and should be careful to avoid a cure which is worse than the disease.

The big problem

The rapid development of increasingly capable AI has many people asking to be protected, and many offering that protection. The latest is a white paper titled: “Frontier AI Regulation: Managing Emerging Risks to Public Safety’’ (FAR). Many authors of the paper are connected to OpenAI and Google, and to various organizations funded by investors of OpenAI and Google. FAR claims that “government involvement will be required to ensure that such ‘frontier AI models’ are harnessed in the public interest”. But can we really ensure such a thing? At what cost?

There’s one huge, gaping problem which FAR fails to address.¹

Anyone with access to the full version of a powerful AI model has far more power than someone that can only access that model through a restricted service. But very few people will have access to the full model. If AI does become enormously powerful, then this huge power differential is unsustainable.

While superficially seeming to check off various safety boxes, the regulatory regime being advanced in FAR ultimately leads to a vast amount of power being placed into the entrenched companies (by virtue of them having access to the raw models), giving them an information asymmetry against all other actors - including governments seeking to regulate or constrain them. It may lead to the destruction of society.

Here’s why: because these models are general-purpose computing devices, it is impossible to guarantee they can’t be used for harmful applications. That would be like trying to make a computer that can’t be misused (such as for emailing a blackmail threat). The full original model is vastly more powerful than any “ensured safe” service based on it can ever be. The full original model is general-purpose: it can be used for anything. But if you give someone a general-purpose computing device, you can’t be sure they won’t use it to cause harm.

So instead, you give them access to a service which provides a small window into the full model. For instance, OpenAI provides public access to a tightly controlled and tuned text-based conversational interface to GPT-4, but does not provide full access to the GPT-4 model itself.

If you control a powerful model that mediates all consumption and production of information,² and it’s a proprietary secret, you can shape what people believe, how people act — and censor whatever you please.

The ideas being advanced in FAR ultimately lead to the frontier of AI becoming inaccessible to everyone who doesn’t work at a small number of companies, whose dominance will be enshrined by virtue of these ideas. This is an immensely dangerous and brittle path for society to go down.

The race

So let’s recap what happens under these regulatory proposals. We have the world’s most powerful technology, rapidly developing all the time, but only a few big companies have access to the most powerful version of that technology that allows it to be used in an unrestricted manner.

What happens next?

Obviously, everyone who cares about power and money now desperately needs to find a way to get full access to these models. After all, anyone that doesn’t have full access to the most powerful technology in history can’t possibly compete. The good news for them is that the models are, literally, just a bunch of numbers. They can be copied trivially easily, and once you’ve got them, you can pass them around to all your friends for nothing. (FAR has a whole section on this, which it calls “The Proliferation Problem”.)

There are plenty of experts on exfiltrating data around, who know how to take advantage of blackmail, bribery, social engineering, and various other methods which experience tells us are highly effective. For those with the discretion not to use such unsavory methods, but with access to resources, they too can join the ranks of the AI-capable by spending $100m or so.³ Even the smallest company on the Fortune Global 2000 has $7 billion annual revenue, making such an expenditure well within their budget. And of course most country governments could also afford such a bill. Of course, none of these organizations could make these models directly available to the public without contravening the requirements of the proposed regulations, but by definition at least some people in each organization will have access to the power of the full model.

Those who crave power and wealth, but fail to get access to model weights, now have a new goal: get themselves into positions of power at organizations that have big models, or get themselves into positions of power at the government departments that make these decisions. Organizations that started out as well-meaning attempts to develop AI for societal benefit will soon find themselves part of the corporate profit-chasing machinery that all companies join as they grow, run by people that are experts at chasing profits.

The truth is that this entire endeavor, this attempt to control the use of AI, is pointless and ineffective. Not only is “proliferation” of models impossible to control (because digital information is so easy to exfiltrate and copy), it turns out that restrictions on the amount of compute for training models are also impossible to enforce. That’s because it’s now possible for people all over the world to virtually join up and train a model together. For instance, Together Computer has created a fully decentralized, open, scalable cloud for AI, and recent research has shown it is possible to go a long way with this kind of approach.

Graphics processing units (GPUs), the actual hardware used for training models, are the exact same hardware used for playing computer games. There is more compute capacity in the world currently deployed for playing games than for AI. Gamers around the world can simply install a small piece of software on their computers to opt into helping train these open-source models. Organizing such a large-scale campaign would be difficult, but not without precedent, as seen in the success of projects such as Folding@Home and SETI@Home.

And developers are already thinking about how to ensure that regular people can continue to train these models — for instance, in a recent interview with Lex Fridman, Comma.ai founder George Hotz explained how his new company, Tiny Corp, is working on the “Tiny Rack”, which he explains is powered based on the premise: “What’s the most power you can get into your house without arousing suspicion? And one of the answers is an electric car charger.” So he’s building an AI model training system that uses the same amount of power as a car charger.

The AI safety community is well aware of this problem, and has proposed various solutions.⁴ For instance, one recent influential paper by AI policy expert Yo Shavit, which examines surveillance mechanisms that can be added to computer chips, points out that:

“As advanced machine learning systems’ capabilities begin to play a significant role in geopolitics and societal order, it may become imperative that (1) governments be able to enforce rules on the development of advanced ML systems within their borders, and (2) countries be able to verify each other’s compliance with potential future international agreements on advanced ML development.”

Any approach to this must ensure that every manufacturer of such chips be required to include that surveillance capability into their chips, since obviously if a single company failed to do so, then everyone that wanted to train their own powerful models would use that company’s chips. Shavit notes that “exhaustively enforcing such rules at the hardware-level would require surveilling and policing individual citizens’ use of their personal computers, which would be highly unacceptable on ethical grounds”. The reality is however that such rules would be required for centralization and control to be effective, since personal computers can be used to train large models by simply connecting them over the internet.

When the self-described pioneer of the AI Safety movement, Eliezer Yudkowsky, proposed airstrikes on unauthorized data centers and the threat of nuclear war to ensure compliance from states failing to control unauthorized use of computation capability, many were shocked. But bombing data centers and global surveillance of all computers is the only way to ensure the kind of safety compliance that FAR proposes.⁵

Regulate usage, not development

Alex Engler points out an alternative approach to enforced safety standards or licensing of models, which is to “regulate risky and harmful applications, not open-source AI models’’. This is how most regulations work: through liability. If someone does something bad, then they get in trouble. If someone creates a general-purpose tool that someone else uses to do something bad, the tool-maker doesn’t get in trouble. “Dual use” technologies like the internet, computers, and pen and paper, are not restricted to only be available to big companies, and anyone is allowed to build a computer, or make their own paper. They don’t have to ensure that what they build can only be used for societal benefit.

This is a critical distinction: the distinction between regulating usage (that is, actually putting a model into use by making it part of a system — especially a high risk system like medicine), vs development (that is, the process of training the model).

The reason this distinction is critical is because these models are, in fact, nothing but mathematical functions. They take as input a bunch of numbers, and calculate and return a different bunch of numbers. They don’t do anything themselves — they can only calculate numbers. However, those calculations can be very useful! In fact, computers themselves are merely calculating machines (hence their name: “computers”). They are useful at the point they are used — that is, connected to some system that can actually do something.

FAR addresses this distinction, claiming “Improvements in AI capabilities can be unpredictable, and are often difficult to fully understand without intensive testing. Regulation that does not require models to go through sufficient testing before deployment may therefore fail to reliably prevent deployed models from posing severe risks.” This is a non-sequitur. Because models cannot cause harm without being used, developing a model cannot be a harmful activity.⁶ Furthermore, because we are discussing general-purpose models, we cannot ensure safety of the model itself — it’s only possible to try to secure the use of a model.

Another useful approach to regulation is to consider securing access to sensitive infrastructure, such as chemical labs. FAR briefly considers this idea, saying “for frontier AI development, sector-specific regulations can be valuable, but will likely leave a subset of the high severity and scale risks unaddressed.” But it does not study it further, resting on the assumption of an assumed “likely” subset of remaining risks to promote an approach which, as we’ve seen, could undo centuries of cultural, societal, and political development.

If we are able to build advanced AI, we should expect that it could at least help us identify the sensitive infrastructure that needs hardening. If it’s possible to use such infrastructure to cause harm then it seems very likely that it can be identified — if AI can’t identify it, then it can’t use it. Now of course, actually dealing with an identified threat might not be straightforward; if it turns out, for instance, that a benchtop DNA printer could be used to produce a dangerous pathogen, then hardening all those devices is going to be a big job. But it’s a much smaller and less invasive job than restricting all the world’s computing devices.

This leads us to another useful regulatory path: deployment disclosure. If you’re considering connecting an automated system which uses AI to any kind of sensitive infrastructure, then we should require disclosure of this fact. Furthermore, certain types of connection and infrastructure should require careful safety checks and auditing in advance.

The path to centralization

Better AI can be used to improve AI. This has already been seen many times, even in the earlier era of less capable and well-resourced algorithms. Google has used AI to improve how data centers use energy, to create better neural network architectures, and to create better methods for optimizing the parameters in those networks. Model outputs have been used to create the prompts used to train new models, and to create the model answers for these prompts, and to explain the reasoning for answers.

As models get more powerful, researchers will find more ways to use them to improve the data, models, and training process. There is no reason to believe that we are anywhere near the limits of the technology. There is no data which we can use to make definitive predictions about how far this can go, or what happens next.

Those with access to the full models can build new models faster and better than those without. One reason is that they can fully utilize powerful features like fine-tuning, activations, and the ability to directly study and modify weights.⁷ One recent paper, for instance, found that fine-tuning allows models to solve challenging problems with orders of magnitude fewer parameters than foundation models.

This kind of feedback loop results in centralization: the big companies get bigger, and other players can’t compete. This results in centralization, less competition, and as a result higher prices, less innovation, and lower safety (since there’s a single point of failure, and a larger profit motive which encourages risky behavior).

There are other powerful forces towards centralization. Consider Google, for instance. Google has more data than anyone else on the planet. More data leads directly to better foundation models. Furthermore, as people use their AI services, they are getting more and more data about these interactions. They use AI to improve their products, making them more “sticky” for their users and encouraging more people to use them, resulting in them getting still more data, which improves their models and products based on them further. Also, they are increasingly vertically integrated, so they have few powerful suppliers. They create their own AI chips (TPUs), run their own data centers, and develop their own software.

Regulation of frontier model development encourages greater centralization. Licensing, in particular, is an approach proposed in FAR which is a potent centralization force. Licensing the development of frontier models requires that new entrants must apply for permission before being allowed to develop a model as good, or better, than the current state of the art. This makes it even harder to compete with entrenched players. And it opens up an extremely strong path to regulatory capture, since it results in an undemocratic licensing board having the final say in who has access to build the most powerful technology on the planet. Such a body would be, as a result, potentially the most powerful group in the world.

Open source, and a new era of AI enlightenment

The alternative to craving the safety and certainty of control and centralization is to once again take the risk we took hundreds of years ago: the risk of believing in the power and good of humanity and society. Just as thinkers of the Enlightenment asked difficult questions like “What if everyone got an education? What if everyone got the vote?”, we should ask the question “What if everyone got access to the full power of AI?”

To be clear: asking such questions may not be popular. The counter-enlightenment was a powerful movement for a hundred years, pushing back against “the belief in progress, the rationality of all humans, liberal democracy, and the increasing secularization of society”. It relied on a key assumption, as expounded by French philosopher Joseph de Maistre, that “Man in general, if reduced to himself, is too wicked to be free.”

We can see from the results of the Enlightenment that this premise is simply wrong. But it’s an idea that just won’t go away. Sociologists have for decades studied and documented “elite panic” — the tendency of elites to assume that regular people will respond badly to disasters and that they must therefore be controlled. But that’s wrong too. In fact, it’s more than wrong, as Rebecca Solnit explains: “I see these moments of crisis as moments of popular power and positive social change. The major example in my book is Mexico City, where the ’85 earthquake prompted public disaffection with the one-party system and, therefore, the rebirth of civil society.”

What does it look like to embrace the belief in progress and the rationality of all humans when we respond to the threat of AI mis-use? One idea which many experts are now studying is that open source models may be the key.

Models are just software — they are mathematical functions embodied as code. When we copy software, we don’t usually call it “proliferation” (as FAR does). That word is generally associated with nuclear weapons. When we copy software, we call it “installing”, or “deploying”, or “sharing”. Because software can be freely copied, it has inspired a huge open source movement which considers this sharing a moral good. When all can benefit, why restrict value to a few?

This idea has been powerful. Today, nearly every website you use is running an open source web server (such as Apache), which in turn is installed on an open source operating system (generally Linux). Most programs are compiled with open source compilers, and written with open source editors. Open source documents like Wikipedia have been transformative. Initially, these were seen as crazy ideas that had plenty of skeptics, but in the end, they proved to be right. Quite simply, much of the world of computers and the internet that you use today would not exist without open source.

What if the most powerful AI models were open source? There will still be Bad Guys looking to use them to hurt others or unjustly enrich themselves. But most people are not Bad Guys. Most people will use these models to create, and to protect. How better to be safe than to have the massive diversity and expertise of human society at large doing their best to identify and respond to threats, with the full power of AI behind them? How much safer would you feel if the world’s top cyber-security, bio-weapons, and social engineering academics were working with the benefits of AI to study AI safety, and that you could access and use all of their work yourself, compared to if only a handful of people at a for-profit company had full access to AI models?

In order to gain access to the better features of full model access, and reduce the level of commercial control of what has previously been an open research community with a culture of sharing, the open-source community has recently stepped in and trained a number of quite capable language models. As of July 2023, the best of these are at a similar level to the second-tier cheaper commercial models, but not as good as GPT-4 or Claude. They are rapidly increasing in capability, and are attracting increasing investment from wealthy donors, governments, universities, and companies that are seeking to avoid concentration of power and ensure access to high quality AI models.

However, the proposals for safety guarantees in FAR are incompatible with open source frontier models. FAR proposes “it may be prudent to avoid potentially dangerous capabilities of frontier AI models being open sourced until safe deployment is demonstrably feasible”. But even if an open-source model is trained in the exact same way from the exact same data as a regulatorily-approved closed commercial model, it can still never provide the same safety guarantees. That’s because, as a general-purpose computing device, anybody could use it for anything they want — including fine-tuning it using new datasets and for new tasks.

Open source is not a silver bullet. This still requires care, cooperation, and deep and careful study. By making the systems available to all, we ensure that all of society can both benefit from their capabilities, but can also work to understand and counter their potential harms. Stanford and Princeton’s top AI and policy groups teamed up to respond to the US government’s request for comment on AI accountability, stating that:

“For foundation models to advance the public interest, their development and deployment should ensure transparency, support innovation, distribute power, and minimize harm… We argue open-source foundation models can achieve all four of these objectives, in part due to inherent merits of open-source (pro-transparency, pro-innovation, anti-concentration)”

Furthermore they warn that:

“If closed-source models cannot be examined by researchers and technologists, security vulnerabilities might not be identified before they cause harm… On the other hand, experts across domains can examine and analyze open-source models, which makes security vulnerabilities easier to find and address. In addition, restricting who can create FMs would reduce the diversity of capable FMs and may result in single points of failure in complex systems.”

The idea that access to the best AI models is critical to studying AI safety is, in fact, fundamental to the origin story of two of the most advanced AI companies today: OpenAI, and Anthropic. Many have expressed surprise that the executives of these companies have loudly warned of the potential existential risks of AI, yet they’re building those very models themselves. But there’s no conflict here — they’ve explained that the reason they do this is because they don’t believe it’s possible to properly understand and mitigate AI risks if you don’t have access to the best available models.

Access to open source models is at grave risk today. The European AI Act may effectively ban open source foundation models, based on similar principles to those in FAR. Technology innovation policy analyst Alex Engler, in his article “The EU’s attempt to regulate open-source AI is counterproductive”, writes:

“The Council’s attempt to regulate open-source could create a convoluted set of requirements that endangers open-source AI contributors, likely without improving use of GPAI. Open-source AI models deliver tremendous societal value by challenging the domination of GPAI by large technology companies and enabling public knowledge about the function of AI.”

First, do no harm

FAR concludes that “Uncertainty about the optimal regulatory approach to address the challenges posed by frontier AI models should not impede immediate action”. But perhaps they should. Indeed, AI policy experts Patrick Grady and Daniel Castro recommend exactly this — don’t be in a hurry to take regulatory action:

‘The fears around new technologies follow a predictable trajectory called “the Tech Panic Cycle.” Fears increase, peak, then decline over time as the public becomes familiar with the technology and its benefits. Indeed, other previous “generative” technologies in the creative sector such as the printing press, the phonograph, and the Cinématographe followed this same course. But unlike today, policymakers were unlikely to do much to regulate and restrict these technologies. As the panic over generative AI enters its most volatile stage, policymakers should take a deep breath, recognize the predictable cycle we are in, and put any regulation efforts directly aimed at generative AI temporarily on hold.’

Instead, perhaps regulators should consider the medical guidance of Hippocrates: “do no harm”. Medical interventions can have side effects, and the cure can sometimes be worse than the disease. Some medicines may even damage immune response, leaving a body too weakened to be able to fight off infection.

So too with regulatory interventions. Not only can the centralisation and regulatory capture impacts of “ensuring safety” cause direct harm to society, but they can even result in decreased safety. If just one big organization holds the keys to vast technological power, we find ourselves in a fragile situation where the rest of society does not have access to the same power to protect ourselves. A fight for power could even be the trigger for the kind of misuse of AI that triggers societal destruction.

The impact of AI regulations will be nuanced, complex, and hard to predict. The balance between defending society and empowering society to defend itself is precariously delicate. Rushing to regulate seems unlikely to walk that tight-rope successfully.

We have time. The combined capabilities of all of human society are enormous, and for AI to surpass that capability is a big task. Ted Sanders, an OpenAI technical expert who has won numerous technology forecasting competitions, along with Ari Allyn-Feuer, Director of AI at GSK, completed an in-depth 114 page analysis of the timeframes associated with AI development, concluding that “we estimate the likelihood of transformative artificial general intelligence (AGI) by 2043 and find it to be <1%”.

Importantly, the more time passes, the more we learn. Not just about the technology, but how society responds to it. We should not rush to implement regulatory changes which put society on a dystopian path that may be impossible to get off.

Concerns about AI safety of advanced language models are not new. In early 2019 I wrote “Some thoughts on zero-day threats in AI, and OpenAI’s GPT-2”, a reaction to OpenAI’s controversial and (at the time) unusual decision to not release the weight of their new language model. In considering this decision, I pointed out that:

The most in-depth analysis of this topic is the paper The Malicious Use of Artificial Intelligence. The lead author of this paper now works at OpenAI, and was heavily involved in the decision around the model release. Let’s take a look at the recommendations of that paper:

Policymakers should collaborate closely with technical researchers to investigate, prevent, and mitigate potential malicious uses of AI

Researchers and engineers in artificial intelligence should take the dual-use nature of their work seriously, allowing misuse-related considerations to influence research priorities and norms, and proactively reaching out to relevant actors when harmful applications are foreseeable.

Best practices should be identified in research areas with more mature methods for addressing dual-use concerns, such as computer security, and imported where applicable to the case of AI.

Actively seek to expand the range of stakeholders and domain experts involved in discussions of these challenges.

“The Malicious Use of Artificial Intelligence” was written by 26 authors from 14 institutions, spanning academia, civil society, and industry. The lead author is today the Head of Policy at OpenAI. It’s interesting to see how far OpenAI, as co-creators of FAR, has moved from these original ideas. The four recommendations from the Malicious Use paper are full of humility — they recognise that effective responses to risks involve “proactively reaching out to relevant actors”, learning from “research areas with more mature methods for addressing dual-use concerns, such as computer security”, and “expand the range of stakeholders and domain experts involved in discussions”. The focus was not in centralization and control, but outreach and cooperation.

The idea that the robot apocalypse may be coming is a striking and engaging idea. FAR warns that we must “guard against models potentially being situationally aware and deceptive”, linking to an article claiming that our current path “is likely to eventually lead to a full-blown AI takeover (i.e. a possibly violent uprising or coup by AI systems)”. It’s the kind of idea that can push us to something, anything, that makes us feel more safe. To push back against this reaction requires maturity and a cool head.

The ancient Greeks taught us about the dangers of Hubris: excessive pride, arrogance, or overconfidence. When we are over-confident that we know what the future has in store for us, we may well over-react and create the very future we try to avoid. What if, in our attempts to avoid an AI apocalypse, we centralize control of the world’s most powerful technology, dooming future society to a return to a feudal state in which the most valuable commodity, compute, is owned by an elite few. We would be like King Oedipus, prophesied to kill his father and marry his mother, who ends up doing exactly that as a result of actions designed to avoid that fate. Or Phaethon, so confident in his ability to control the chariot of the sun that he avoids the middle path laid out by Helios, his father, and in the process nearly destroys Earth.

“The Malicious Use of Artificial Intelligence” points towards a different approach, based on humility: one of consultation with experts across many fields, cooperation with those impacted by technology, in an iterative process that learns from experience.

If we did take their advice and learn from computer security experts, for instance, we would learn that a key idea from that field is that “security through obscurity” — that is, hiding secrets as a basis for safety and security — is ineffective and dangerous. Cyber-security experts Arvind Narayanan, director of Princeton’s Center for Information Technology Policy, and Sayash Kapoor, in a recent analysis detailed five “major AI risks” that would be caused by licensing and similar regulations where “only a handful of companies would be able to develop state-of-the-art AI”:

Monoculture may worsen security risks

Monoculture may lead to outcome homogenization

Defining the boundaries of acceptable speech

Influencing attitudes and opinions

Regulatory capture.

How did we get here?

Everyone I know who has spent time using tools like GPT-4 and Bard has been blown away by their capabilities — including me! Despite their many mistakes (aka “hallucinations”), they can provide all kinds of help on nearly any topic. I use them daily for everything from coding help to playtime ideas for my daughter.

As FAR explains:

“Foundation models, such as large language models (LLMs), are trained on large, broad corpora of natural language and other text (e.g., computer code), usually starting with the simple objective of predicting the next “token.” This relatively simple approach produces models with surprisingly broad capabilities. These models thus possess more general-purpose functionality than many other classes of AI models”

It goes on to say:

“In focusing on foundation models which could have dangerous, emergent capabilities, our definition of frontier AI excludes narrow models, even when these models could have sufficiently dangerous capabilities. For example, models optimizing for the toxicity of compounds or the virulence of pathogens could lead to intended (or at least foreseen) harms and thus may be more appropriately covered with more targeted regulation. Our definition focuses on models that could — rather than just those that do — possess dangerous capabilities”

Therefore, the authors propose “safety standards for responsible frontier AI development and deployment” and “empowering a supervisory authority to identify and sanction non-compliance; or by licensing the deployment and potentially the development of frontier AI”. They propose doing this in order to “ensure that” models “are harnessed in the public interest”.

Let’s say these proposals are accepted and this regulation is created. What happens next? Well, there are two possibilities:

The growth of AI capabilities hits a limit, such that whilst AI may turn out to be a highly significant technology, we don’t get to a super-intelligence that could destroy society, or
AI continues to develop in capability until it’s by far the most powerful technological force in human history. OpenAI CEO Sam Altman’s prediction turns out to be prescient, that people with this technology can “maybe capture the light cone of all future value in the universe”.

In the case of (1), there’s little more to discuss. The regulations proposed in FAR would, at worst, be unnecessary, and perhaps lead to some regulatory capture of a fairly valuable product space. That would be a shame, but we can live with it. But this isn’t the case that FAR’s proposals are designed to handle — for the risks of misuse of regular technology like that we already have plenty of simple, well-understood approaches, generally based on liability for misuse (that is, if you do something bad using some technology, you get in trouble; the folks that made the technology don’t generally get in trouble too, unless they were negligent or otherwise clearly and directly contributed to the bad thing).

Therefore we should focus on (2) — the case where AI turns out to be a very big deal indeed. To be clear, no one is certain this is going to happen, but plenty of people that have studied AI for a long time think it’s a real possibility.

Humanity’s most powerful technology

We are now in the era of “general-purpose artificial intelligence” (GPAI) thanks to “universal” or “foundation” models, such as OpenAI’s GPT-4, Google’s Bard, and Anthropic’s Claude. These models are general-purpose computing devices. They can answer (with varying degrees of success) nearly any question you can throw at them.

As foundation models get more powerful, we should expect researchers to find more ways to use them to improve the data, models, and training process. Current models, dataset creation techniques, and training methods are all quite simple – the basic ideas fit in a few lines of code. There are a lot of fairly obvious paths to greatly improve them, and no reason to believe that we are anywhere near the limits of the technology. So we should expect to see increasingly fast cycles of technological development over the coming months and years. There is no data which we can use to make definitive predictions about how far this can go, or what happens next. Many researchers and AI company executives believe that there may be no practical limit.

But these models are expensive to train. Thanks to technological advances, they’re getting cheaper to train the same sized model, but the models are getting bigger and bigger. GPT-4 may have cost around $100m to train. All the most powerful current models, GPT-4, Bard, and Claude, have been trained by large companies in the US (OpenAI, Google, and Anthropic respectively) and China.

Building together

There are already a great many regulatory initiatives in place, including The White House Office of Science and Technology Policy’s Blueprint for an AI Bill of Rights, National Institutes of Standards and Technology’s AI Risk Management Framework, and Biden’s Executive Order 14091 to protect Americans against algorithmic discrimination.

The AI community has also developed effective mechanisms for sharing important information, such as Datasheets for Datasets, Model Cards for Model Reporting, and Ecosystem Graphs. Regulation could require that datasets and models include information about how they were built or trained, to help users deploy them more effectively and safely. This is analogous to nutrition labels: whilst we don’t ban people from eating too much junk food, we endeavor to give them the information they need to make good choices. The proposed EU AI Act already includes requirements for exactly this kind of information.

Whilst there is a lot of good work we can build on, there’s still much more to be done. The world of AI is moving fast, and we’re learning every day. Therefore, it’s important that we ensure the choices we make preserve optionality in the future. It’s far too early for us to pick a single path and decide to hurtle down it with unstoppable momentum. Instead, we need to be able, as a society, to respond rapidly and in an informed way to new opportunities and threats as they arise. That means involving a broad cross-section of experts from all relevant domains, along with members of impacted communities.

The more we can build capacity in our policy making bodies, the better. Without a deep understanding of AI amongst decision makers, they have little choice but to defer to industry. But as Marietje Schaake, international policy director at Stanford University’s Cyber Policy Center, said, “We need to keep CEOs away from AI regulation”:

“Imagine the chief executive of JPMorgan explaining to Congress that because financial products are too complex for lawmakers to understand, banks should decide for themselves how to prevent money laundering, enable fraud detection and set liquidity to loan ratios. He would be laughed out of the room. Angry constituents would point out how well self-regulation panned out in the global financial crisis. From big tobacco to big oil, we have learnt the hard way that businesses cannot set disinterested regulations. They are neither independent nor capable of creating countervailing powers to their own.”

We should also be careful to not allow engaging and exciting sci-fi scenarios to distract us from immediate real harms. Aiden Gomez, the co-creator of the transformers neural network architecture, which powers all the top language models including GPT 4, warns:

“*There are real risks with this technology. There are reasons to fear this technology, and who uses it, and how. So, to spend all of our time debating whether our species is going to go extinct because of a takeover by a superintelligent AGI is an absurd use of our time and the public’s mindspace… I would really hope that the public knows some of the more fantastical stories about risk [are unfounded]. They’re distractions from the conversations that should be going on.”

The dislightenment

What if, faced with a new power, with uncertainty, with a threat to our safety, we withdraw to the certainty of centralization, of control, of limiting power to a select few? This is the Dislightenment. The roll-back of the principles that brought us the Age of Enlightenment.

We would create a world of “haves” and “have-nots”. The “haves” (big companies, organized crime, governments, and everyone that convinces their friends and family members to get a copy of the weights for them, and everyone that accesses darknet sites where hackers distribute those weights, and everyone that copies them…) can build better and better models, models which can (according to FAR) be used for mass propaganda, bio and cyber threat development, or simply for the purpose of ensuring you beat all of your competition and monopolize the most strategic and profitable industries.

The “have-nots” would provide little value to society, since they can only access AI through narrow portals which provide limited (but “safe”) applications.

The push for commercial control of AI capability is dangerous. Naomi Klein, who coined the term “shock doctrine” as “the brutal tactic of using the public’s disorientation following a collective shock… to push through radical pro-corporate measures”, is now warning that AI is “likely to become a fearsome tool of further dispossession and despoilation”.

Once we begin down this path, it’s very hard to turn back. It may, indeed, be impossible. Technology policy experts Anja Kaspersen, Kobi Leins, and Wendell Wallach, in their article “Are We Automating the Banality and Radicality of Evil?”, point out that deploying bad solutions (such as poorly designed regulation) can take decades to undo, if the bad solution turns out to be profitable to some:

“The rapid deployment of AI-based tools has strong parallels with that of leaded gasoline. Lead in gasoline solved a genuine problem—engine knocking. Thomas Midgley, the inventor of leaded gasoline, was aware of lead poisoning because he suffered from the disease. There were other, less harmful ways to solve the problem, which were developed only when legislators eventually stepped in to create the right incentives to counteract the enormous profits earned from selling leaded gasoline.”

With centralization, we will create “haves” and “have-nots”, and the “haves” will have access to a technology that makes them vastly more powerful than everyone else. When massive power and wealth differentials are created, they are captured by those that most want power and wealth, and history tells us violence is the only way such differentials can be undone. As John F Kennedy said, “Those who make peaceful revolution impossible will make violent revolution inevitable.” Perhaps, with the power of AI and the creation of the surveillance needed to maintain control, even violence will be an ineffective solution.

If we do start in this direction, let’s do it with eyes open, understanding where it takes us.

The fragility of the Age of Enlightenment

Through most of human history, the future was scary. It was unsafe. It was unknown. And we responded in the most simple and obvious way: by collectively placing our trust in others more powerful than us to keep us safe. Most societies restricted dangerous tools like education and power to an elite few.

But then something changed. A new idea took hold in the West. What if there is another way to be safe: to trust in the overall good of society at large, rather put faith in a powerful elite? What if everyone had access to education? To the vote? To technology? This—though it would take a couple more centuries of progress for its promises to be fully realized—was the Age of Enlightenment.

Now that so many of us live in liberal democracies it’s easy to forget how fragile and rare this is. But we can see nations around the world now sliding into the arms of authoritarian leaders. As Hermann Göring said, “The people can always be brought to the bidding of the leaders. That is easy. All you have to do is tell them they are being attacked…”

Let’s be clear: we are not being attacked. Now is not the time to give up the hard-won progress we’ve made towards equality and opportunity. No one can guarantee your safety, but together we can work to build a society, with AI, that works for all of us.

Appendix: Background

This document started out as a red team review of Frontier AI Regulation: Managing Emerging Risks to Public Safety. Although red-teaming isn’t common for policy proposals (it’s mainly used in computer security) it probably should be, since they can have risks that are difficult to foresee without careful analysis. Following the release of the Parliament Version of the EU AI Act (which included sweeping new regulation of foundation model development), along with other similar private regulatory proposals from other jurisdictions that I was asked to review, I decided to expand our analysis to cover regulation of model development more generally.

I’ve discussed these issues during the development of this review with over 70 experts from the regulatory, policy, AI safety, AI capabilities, cyber-security, economics, and technology transition communities, and have looked at over 300 academic papers. Eric Ries and I recorded a number of expert interviews together, which we will be releasing in the coming weeks.

Our view is that the most important foundation for society to successfully transition to an AI future is for all of society to be involved, engaged, and informed. Therefore, we are working to build a cross-disciplinary community resource, to help those working on responses to the potential opportunities and threats of advanced AI. This resource will be called “AI Answers”. The review you’re reading now is the first public artifact to come out of the development of this project. If you’re a policy maker or decision maker in this field, or do research in any area that you feel has results possibly useful to this field, we want to hear from you!

Acknowledgments

Eric Ries has been my close collaborator throughout the development of this article and I’m profoundly appreciative of his wisdom, patience, and tenacity. Many thanks to the detailed feedback from our kind reviewers: Percy Liang, Marietje Schakke, Jack Clark, Andrew Maynard, Vijay Sundaram, and Brian Christian. Particularly special thanks to Yo Shavit, one of the authors of FAR, who was very generous in his time in helping me strengthen this critique of his own paper! I’m also grateful for the many deep conversations with Andy Matuschak, whose thoughtful analysis was critical in developing the ideas in this article. I’d also like to acknowledge Arvind Narayanan, Sayash Kapoor, Seth Lazar, and Rich Harang for the fascinating conversations that Eric and I had with them. Thank you to Jade Leung from OpenAI and Markus Anderljung from Governance.ai for agreeing to the review process and for providing pre-release versions of FAR for us to study.

Footnotes

Although to be fair to the authors of the paper — it’s not a problem I’ve seen mentioned or addressed anywhere.↩︎
As will happen if AI continues to develop in capability, without limit.↩︎
The cost of frontier models may continue to rise. Generative AI startup inflection.ai recently raised $1.3 billion, and plans to spend most of it on GPUs. But hundreds of companies could still afford to train a model even at that cost. (And even if they couldn’t, the implication is that theft then becomes the only way to compete. It doesn’t mean that models won’t proliferate.)↩︎
Although they are not discussed in FAR.↩︎
At least, in the case that AI turns out to be powerful enough that such regulation is justified in the first place↩︎
This doesn’t mean that model development shouldn’t be done without consideration of ethics or impact. Concepts like open source, responsible innovation, informed dialogue and democratic decision making are all an important part of model development. But it does mean we do not need to ensure safety at the point of development.↩︎
The only commercially available models that provide fine-tuning and activations, as at July 2023 are older, less capable models, and weights are not available for any major commercial model. OpenAI plans to provide some fine-tuning and activations features for GPT 4 down the track, but they will have had over a year headstart over everyone else at that point. Regardless, without access to the weights, developers’ ability to fully customize and tune models remains limited.↩︎