What on earth is going on with AI? And what should we do about it?

My wife, Rachel, our 7yo daughter, and I had been looking forward to this vacation for months. We were heading to a remote location on a beautiful island off the coast of Queensland, Australia. As we waited for the boat to pick us up, I idly scrolled through the latest news. There was breaking news: OpenAI had just released a new model called GPT 4.

As I looked through examples of what GPT 4 could do, I knew that a moment that for nearly 30 years I’d been waiting for had finally arrived. And I strongly suspected society would never be the same again.

Language models and me

In the early 90’s I was a philosophy student at the University of Melbourne. Throughout my high-school years I’d been obsessed with spreadsheet and database analysis, and had spent every holidays for the last couple of years building analytical solutions for a local independent strategy consultant. But this seemed like a dead-end, since no university covered anything like this — so I decided to study philosophy as being the option that didn’t actually commit me to any particular occupation.

At that time a new book, “Consciousness Explained” by Daniel Dennett, had just been released, and was the focus of our studies into cognitive science. The big question we were asked to tackle was this: could a machine that merely manipulates words as symbols, picking the next word of a sentence through pure symbol manipulation, ever develop an understanding of the world?

After reading many books and papers I was surprised to find that the answer clearly seemed to me to be: yes, it could. But such a machine would be so incredibly complex that it seemed unlikely that anyone could ever build it, so I guessed it would remain of purely intellectual interest, with no practical applications.

However, this soon changed. Through a combination of good luck and good planning I found my calling at the age of 19, as the first Analytical Specialist at McKinsey & Company, a multinational consulting firm. It turned out that my interest in data analysis wasn’t a dead end after all! It was while I was there that I heard about a fascinating algorithm called the neural network.

The neural net was a mathematical function inspired by our understanding of the connectivity of human neurons, and had been mathematically proved to be able to solve literally any computable problem (given enough time and suitable data). This led to me wonder: even though a human could probably never build a system complex enough to model language such that it behaved apparently intelligently, could a neural net learn such a behaviour?

I later moved to another consulting firm, AT Kearney, where I was lucky enough in the mid 90’s to have access to cutting edge neural network hardware at a big bank we were working for. And we also had access to billions of rows of data at the bank. I helped the bank develop systems to query a data warehouse so it could access all this data, and to then make this available as training data to a neural network.

Whilst the results were commercially promising, helping the bank improve responses from their targeted marketing campaigns, the technology seemed impossibly far away from being able to do anything like modelling natural language. I largely gave up on neural networks for the next 15 years, and focused on other approaches to machine learning and data science. I promised myself I’d keep a close eye out however for developments in the neural net world, since I was sure that it was merely a matter of increasing data and computation capability that would be needed. Perhaps we wouldn’t get to such a level in my lifetime, or perhaps ever — there was no way to know.

In 2011, I found myself as Founding President, Chief Scientist (and 49% owner) of the biggest name at the time in machine learning: Kaggle. Kaggle was a platform that allowed machine learning practitioners to compete against each other to prove who could make the most accurate predictions using their models on new data. It gave me the perfect position to see exactly what was happening in the world of machine learning. Who were the best in the world, and what techniques were they using?

And in 2011-2012 something very interesting was happening — neural nets were starting to win these international machine learning competitions. Not only that, but they were even sometimes beating humans at tasks that previously had been beyond the capabilities of machines — in particular, recognising the contents of photos.

Perhaps we were coming to a time where neural nets could begin to fulfil the potential I’d hoped for years ago? I was determined to find out, so I took a year off to study the current state of neural networks and their potential applications. I interviewed dozens of people in academia and industry, and ran many experiments. My conclusion: it was still early, but there might be a change.

The field developed rapidly, and by 2014 I was convinced. Neural nets were on the verge of doing many of the cognitive tasks that humans had previously specialised in. I gave a talk published on TED.com called “The Wonderful and Terrifying Implications of Machines That Can Learn”. I explained that as neural nets became better than humans at more and more cognitive tasks, it could help us all achieve the things we’d always dreamed of doing — but could also shatter the under-pinnings of our society.

My wife Rachel and I decided that the best thing we could do to help usher in the wonderful things, and avoid the terrible things, would be to get a large, diverse, and varied group of people, from a variety of backgrounds and subject domains, understanding and using this technology.

So we put our own money into launching fast.ai, an organisation dedicated to making AI more accessible. We decided to avoid all commercial activities and not take any grant funding, so that we’d be truly independent. We partnered closely with the University of San Francisco, and built free courses, developed free open source software, completed academic research that decreased the resources and complexity required to get world-class results, and ran a large flourishing community of new practitioners.

During the development of our first free online course, I was doing a lot of research into a field called “transfer learning”, which I believed was the critical piece to making AI more accessible. Transfer learning was being used to do amazing things in computer vision (CV), resulting in dramatically reduced requirements for computer resources and data (two things which previously had greatly constrained the accessibility of AI in this field).

I wondered if this could also be used for natural language. Could it be the key to finally unlocking the ideas I’d wondered about when studying philosophy all those years ago? Natural language processing (NLP) was a well established field, but neural nets were not a widely used tool in the field, and transfer learning was almost entirely absent. I asked some of my friends who were NLP researchers whether the transfer techniques that were working so well in CV could also be used in NLP. The unanimous verdict: no. Natural language was far too complex, and understanding it well enough to use it effectively would require new specialised methods.

I didn’t buy it. In fact, in recent years some researchers had actually started experimenting with a basic version of the idea we’d studied in philosophy: and model which looked at lots of sentences, and was taught to generate missing words in sentences based on analysing the word order of other sentences. These were called language models.

I wondered what would happen if we built a much larger language model, on lots more data. What if we built a large neural network that was taught to predict the next word of a sentence, by looking at lots of sentences. For example, the entirety of Wikipedia? I didn’t know whether this was even possible with the resources we had — but if it was possible, how would it work?

I figured that the most practical way to predict the next word of a sentence like “The legislation was signed into law in 1956 by US President [X]” would be for a model to actually learn enough about concept like time, presidents, and laws to know that US presidents are people that are elected in a particular place for a particular period of time and have a particular name, and that this sentence can be completed by knowing who the US president was at that time. Or to complete a sentence like “By adding up pairs of preceding elements of the fibonacci sequence, we can calculate that its 20th element is [X]”, the model would need to learn about numbers, and arithmetic, and sequences.

Let’s say we had enough data and enough compute for a neural net to figure these things out just by looking at words – how would we surface these capabilities? All the model would actually be able to do would be to fill in missing words in sentences — we hadn’t given it a way to surface the latent capabilities it would have itself.

This is where I guessed we could use transfer learning. That is: after training a large language model on lots of data, “fine tune” (i.e continue training for a short time on a small amount of data) the model to do some particular task. If the model really had developed sophisticated latent capabilities, fine tuning should be enough to surface them.

I decided to try this out by taking on what was, at that time, about the toughest academic challenge in NLP academic research: the IMDB dataset. This was a dataset of movie reviews from IMDB. They were long, by normal NLP standards, with thousands of words per review. The task was to have a model “read” a review, and figure out if it expressed a positive or negative sentiment. The ground truth (hidden from the model) was the actual rating the reviewer had given. Many papers had taken on this challenge over the years, and there were some complex and sophisticated specialised approaches developed that had gotten the accuracy on this task pretty high. Beating the academic state of the art (“SoTA”) on this task was a massive challenge.

As a simple starting point, I decided to take a language model architecture developed by my friend Stephen Merity, called AWD-LSTM, and train a language model on all of Wikipedia. Stephen had actually already created a subset of Wikipedia for his own work, but I decided to create my own larger version for this task.

By the end of the day, I’d downloaded Wikipedia and written a script that would process it into a form usable by my model, and left a neural net training overnight on a single graphics processing unit (GPU) — a card that was actually designed for playing computer games, not training models!

The next morning, I fine-tuned the model using the IMDB data, and then gave it the sentiment analysis task to complete, to see how well this “quick and dirty” first cut would do.

To my great surprise, it was more accurate than any previous model in any scientific paper! As it happened, I was presenting the next class in our new course the very next day, so in the last few minutes of class I described this new approach and the results I’d had. The lesson was published in late 2016.

Sebastian Ruder, a PhD student in Ireland, saw my lesson and contacted me. He told me he thought we should work together to publish this approach as an academic paper. It turned out that writing the paper was far more work than creating the algorithm! Sebastian tirelessly analysed the academic literature, came up with a plan for testing on other datasets, and helped me to run experiments on these datasets. Thanks to him, our paper ended up published in 2017 in the top NLP conference, the Association of Computational Linguistics conference (ACL), and has since received thousands of citations.

One of the things I’m most grateful to Sebastian for is that he came up with a great name for our paper: “Universal Language Model Fine Tuning” (which he abbreviated to ULMFiT). The “universal model” is the language model we’d trained (on wikipedia, in this case), and the “fine tuning” is the piece which surfaces the myriad of latent capabilities that the language model learns as a side effect of its word-prediction task.

Many papers were published after ULMFiT, showing bigger and bigger language models, going far beyond what we’d built. BERT, from Google, and GPT, from OpenAI, were particularly impressive. In fact, they were so impressive that they didn’t bother with the entire “fine tuning” part of our recipe! It turned out that their “one shot” performance — that is, the capability of the language model without any fine tuning whatsoever — was so good that the entire focus of the research community for years was on using the language models “raw”.

The way this worked is that a user can enter a sentence which is specifically designed such that the model can keep adding words to it, and these words will provide the answer required. For instance, in order to learn about language models, the user might enter “Computer scientists have developed language models, which can be in simple terms described as [X]”. GPT and BERT did a great job of completing sentences like this, and researchers as a result got very excited about researching one-shot language model performance.

For me, this was a somewhat disappointing experience. I was quite sure that it was through fine-tuning that we could unlock the full potential of these models, and this potential was currently being left on the table. We’d even shown a complete recipe for how to do this fine tuning, and it didn’t take long to complete that recipe! Without fine-tuning, I felt that many of the latent capabilities learned by the model were remaining just that: latent.

But a few years later, finally the magic of fine-tuning was rediscovered! Sadly (to me at least) by this time ULMFiT was largely forgotten, and the recipes generally went unused. Even the term “universal model” got lost, replaced by other newer terms such as “large language model” and “foundation model”.

But the new fine-tuning wave did something that went far beyond ULMFiT, which was to create a very different kind of task to fine-tune on. Instead of rather mundane and specialised fine-tuning tasks like movie sentiment analysis, a technique called “instruction fine tuning” was developed which trained these models for the very general task of giving useful answers to a wide variety of questions!

OpenAI made “instruction tuned” versions of some models available, but they largely flew under the radar. They decided to try and make their instruction tuned model a bit more accessible, by providing a simple web interface to it, instead of requiring developers to write code to use it. The result was ChatGPT, and it took the world by storm.

ChatGPT, by all accounts, was a fairly quick application thrown together by OpenAI without much expectation that it would get much attention. After all, anyone could have created something very similar by putting a similar interface in front of their existing instruction-tuned models, and nobody had bothered to do that! However, ChatGPT turned out to be, by some estimates, the fastest growing product in history.

ChatGPT did add one interesting feature on top of the existing fine-tuned models, which is the used of “Reinforcement Learning From Human Feedback” (RLHF). This refers to an extra fine-tuning step after instruction-tuning, where humans are asked to look at two or more responses to the same question, and judge which they like better. Researchers have discovered that RLHF is one way to make instruction tuned language models even more helpful.

I was very impressed with ChatGPT, and immediately started using it both in my own work, and in our homeschooling. But whilst it was far more capable than any NLP product I’d used before, I noticed it still often fell short of what I really needed. The responses were too often incorrect and rarely showed much insight, and there was no way to integrate it with external resources such as a web browser, calculator, or programming language.

But I knew it was really a prerelease product for something bigger and better that was under development: GPT 4. I’d been hearing a lot about amazing progress that was happening, and wondered whether it’s launch might herald in the wonderful and terrifying world that I’d predicted back in 2014.

An uneasy feeling

In March 2023 my wife, daughter, and I took a one week vacation to paradise. We stayed in a remote part of a beautiful Queensland island, located right by a beautiful beach with clear blue waters protected from the ocean waves by a long curving spit.

And yet, I was feeling deeply uneasy. My wife Rachel could tell. “This place is amazing, isn’t it Jeremy?” “Yes”, I agreed. “But…” She looked at me quizzically. “You know GPT 4 just came out? Somehow it’s making me feel totally discombobulated.”

Rachel looked concerned, but surprised. She is a professor of data science and has a PhD in mathematics, and we had together founded our self-funded lab, fast.ai, in 2016, with the mission of making AI more accessible. “This is hardly a surprise, Jeremy, is it?”

Rachel was right. And I felt ridiculous. Here I was, on vacation in paradise, with an amazing new tool that could help fill the biggest obstacle we’d faced in achieving our mission: the fact that AI needed code, and most people in the world can’t code. “Yeah it’s not a surprise. And there’s a lot to be happy about. GPT 4 could help us with the mission we’ve been working on for years. Maybe there will be less need for people to learn to code, thanks to GPT 4, making AI even more accessible than before…” I thought for a moment and then added, “Frankly Rachel, I don’t even know where these feelings are coming from. Maybe it’s that I don’t feel like I understand what’s going to happen next… And how is this going to impact society overall?”

Her quizzical look increased. “Umm Jeremy are you OK? We have been warning people about risks of disinformation and centralization of power for years. You do recall, don’t you, that in literally every course we’ve produced we’ve had a whole lesson specifically on AI ethics and society? And that I created an entire course on AI ethics? And that I was the founding director of the Center for Applied Ethics? Oh and also that you actually studied cognitive science and ethics at university?”

Somehow, I felt totally unprepared. In using GPT 4, it was clear we were now at around the critical point of wide out-performance of many humans on many tasks that I’d predicted in 2014. (At that time I guessed it would take around 5 years to get here, so actually I was a little over-optimistic about how long it would take.)

So what happens next? What did it mean for fast.ai? For my daughter’s future? For the stability of society?

The truth is, for all the theorising and predictions, the actual release of the model, and using it in practice, made everything a whole lot more real. It impacted me in a profound way.

I did my best to put aside these questions for the rest of the week, and focus on enjoying vacation time with my family. We had a wonderful trip, but I was never quite able to get past that uneasy feeling.

A plan, of sorts

When I got back home first thing I did was to call my friend Eric Ries. Eric is a marvellously clear thinker, and he knows me well enough to feel comfortable being direct and honest. Eric has years of experience studying business trends and opportunities, and is a sought-after expert and best-selling author. “How was the trip, Jeremy”, he asked.

“Eric, I feel like an idiot. I wasn’t really able to fully enjoy it, because I feel so discombobulated by this whole GPT 4 business. I feel like I have no idea what’s really going on now, and what to do about it.”

“Join the club, Jeremy! But do you really think anyone knows what’s going on, let alone what to do about it? I certainly don’t!”

I was stunned. I hadn’t quite realised it, but I’d been assuming that smart folks like Eric would have everything pretty much figured out already. Maybe if even Eric Ries wasn’t sure how to respond to this new era, then my reaction might be more common than I’d imagined. “But Eric, what should I do? Does fast.ai’s mission make sense still? If so, is the way we’re working to achieve that mission still the right way? Are the things I’m teaching my daughter still going to be useful when she’s an adult? Is society ready to handle the economic shifts that are coming? If not, how can I help? Is there anything I can contribute at all?…”

Eric took a moment before responding. “Jeremy, you’re thinking of this as a problem you need to solve now. As a question that needs an answer now. But that’s not how this is going to play out. Whatever the impact of this technology is, it’s going to develop over some period of years. And during those years, there will be key decisions to be made. Opportunities to grab. Threats to overcome. There will be people that, at these times, will make a difference. People like you can make a difference.”

He was right, of course. Eric’s perspective flipped by thinking on its head. The important thing was to understand how I could help create better outcomes from the process that unfold over the coming years. I shouldn’t expect to find some magical “solution” that would result in all opportunities and threats being dealt with, and I shouldn’t expect to even know what all those opportunities and threats would turn out to be.

I had, frankly, been feeling somewhat listless prior to this conversation. But this new perspective filled me with energy — with a drive to learn everything I could that might help to navigate the future.

Learning from experts

If I was going to learn everything I could, what did that mean in practice? Who would I learn from? What would I learn? I barely knew where to start.

I decided to call up a few friends and ask them the two questions that I had: “What the hell is going on? And what should we do about it?” I knew that if I just called my AI research friends I’d get only one narrow perspective. So I reached out to folks that I knew had a deep background in a range of fields, such as economics, policy, and innovation.

The conversations this led to were fascinating and rewarding. I will, however, say this upfront: nearly everyone answered both questions the same way: “I have absolutely no idea what’s going on, and I have even less idea what to do about it!”

“But you’re an expert in your field!” I’d say, “So if you aren’t going to solve this for me, who will?” And often my friends would suggest other folks that they folk thought could help me answer these two questions.

In the end, I spoke with 54 experts from a range of fields over a two month period. Many of these discussions went for hours, sometimes over multiple sessions. And whilst no-one was quite able to give me a neat packaged solution, I started to gain a much deeper appreciation of a number of critical perspectives on the current state and future of AI and society.

Here are the people I spoke to:

Abhishek Thakur; Ajeya Cotra; Alexis Gallagher; Amjad Masad; Andrej Karpathy; Andrew Maynard; Andrew Mayne; Andy Matuschak; Andy McAfee; Arshak Navruzyan; Arvind Narayanan; Brian Christian; Catherine Olsson; Charles Frye; Chip Huyen; Chris Lattner; Chris Olah; Christine McLeavey; Cori Lathan; Cullen O’Keefe; Danny Liu; Dave Rejeski; Eric Ries; Erik Brynjolfsson; Gary Marchant; Gary Marcus; Helen Toner; Jack Clark; Jade Leung; Jason Yosinski; Jim Golden; JJ Allaire; Leandro Von Werra; Lewis Tunstall; Michael Nielsen; Mike Conover; Miles Brundage; Noah Giansiracusa; Rachel Thomas; Riley Goodside; Russell Kaplan; Sam Bowman; Simon Willison; Stephen Merity; Tan Le; Ted Sanders; Timnit Gebru; Tom Kalil; Tyler Cowan; Vijay Sundaram; Vipul Ved Prakash; Yonadav Shavit

You might recognise a lot of these names — and if you do, you’ll see that these are people known for their research and writing, rather than for strident opinions, even although it’s the latter group that tends to get the most coverage in the media! I sought out people that I was likely to learn from.

Normally I’m someone who shies away from all kinds of meetings. I’d rather be coding! But this was very different. I found I was very stimulated by every discussion, and always looked forwards to the next one. Talking to people that are passionate, informed, and who really care is something that never gets old!

My main takeaway from this process has been that there’s an awful lot of important issues and perspectives to consider, and that this entire area is complex and the problems and potential solutions are nuanced.

For instance, different people made fairly compelling arguments so support each of the following claims, each pair of which is at odds with the other!:

Claim 1 Claim 2
AI is developing into such a powerful technology that it may be intentionally or accidentally create great harm. The only way to avoid this is to ensure the capability is centralised, controlled, monitored, and actively restricted to safe uses. The centralisation of power due to AI’s massive economies of scale and positive feedback loops is one of the greatest threats to maintaining a functioning society that we’ve ever faced, and only through ensuring capability and access is decentralised and accessible can we avoid this outcome.
AI could be the greatest driver of human flourishing of any technology in history, and may allow nearly all of humanity to do the things that they’ve always dreamed of. Developing it and making it available to all is a moral imperative. AI which is capable of replacing human labor, including in creative endeavours like art and writing, will displace vulnerable workers and leave them with neither the purpose nor living that was previously providing by their jobs.
Bigger models develop “emergent capabilities”, allowing them to solve problems that previously were inaccessible to computers. Such capabilities could be used to support society in deeply meaningful and transformative ways. Therefore we should aim to build a small number of really big models. Big tech is already in control of vast resources, and has not proven to be a trustworthy custodian. Many of the most vulnerable communities have been harmed by algorithmic decision making and models that have not been developed with an understanding of local community needs. We need to focus on local solutions to local issues, based on models built by and for the communities that need them.
AI can now help create code, visual arts, and writing, making these fields more accessible for more people - even for those without access to a high quality education. These models were trained on work that was often used without compensating the people that created that work. Many artists, for instance, barely scrape by economically and spend years developing their own style — a style which now can be copied by anyone in seconds using AI.
Improved models mean we can get better results from algorithmic decision making. Decisions as a result will be more accurate and therefore more fair, and and the need for menial work will be reduced, making products and services cheaper and therefore available to more people. Historically algorithmic decision making has been used more often for disadvantaged groups, whilst humans are available for the wealthy. Better AI may simply increase this trend. People are overly trusting of algorithmic decisions and there is often no effective avenue of appeal for unfair or wrong decisions.
The rapid development of AI capability will have an increasingly large impact on society. To ensure that society captures the benefits of this development and avoids harms, appropriate regulations should be developed. Big companies and political organisations invest heavily in lobbying and other mechanisms to ensure that the results of regulatory processes are stack in their favour. This results in regulatory capture, where regulations enforce status quo power structures and wealth, rather than supporting society more generally. Therefore, we should avoid regulation where possible.
As AI can take on more of our repetitive and menial work, this will leave more time and energy for people to spend their time on the things that they care the most about. Historically, increased productivity through improved technology has often lead to people working more, due to the “productivity paradox”, the “hedonic treadmill”, and “work intensification”. Further, when a new technology opens new opportunities for companies to rapidly grow and monopolise their markets, this results in a race in which employees are expected to work harder and longer to ensure the company wins.

The reason that no one knows what’s going on, or what to do about it, is that there’s no way anyone can have expertise and experience across the myriad areas that need to be considered to address these questions.

So, what to do?… Perhaps first we should actually ask: what not to do? Because here there’s a clear answer. Since this topic is complex, rapidly developing, and requires a range of expertise not available in any individual or even in a single organisation, we should not rush into making any decisions that we can’t easily undo later, or that might close of paths that we later might want to follow.

In particular, politicians like to make policy. It is, in theory at least, what they do. But there are policies which, if enacted now, may well force us in a direction which turns out later to be deeply problematic. So let’s not rush to decisions that we don’t really need to make right away.

Having said that, time does not, of itself, solve problems. After all, in the nearly 10 years since my talk on TED.com was released, over 2 million people have seen it, but I haven’t actually seen any significant development towards dealing with the issues I pointed out back then. And many others have also been pointing out the opportunities and threats of AI, also generally to little avail.

Or, to consider a different example: experts had warned of the threat of a global pandemic for decades. But when COVID hit, we weren’t ready. Do we really believe that if the pandemic had arrived 6-12 months later that the world would have been significantly more prepared?

I think Eric Ries’s advice to me was exactly right. Rather than trying to pick one big problem, and one big solution, I should focus on what I can do to help develop a constructive and effective process. There will be many problems, and many opportunities, and many key moments and key people at those moments.

Supporting the process

What’s going to maximise the likelihood the getting the best possible outcomes at those key moments? We’re going to need the right people to be engaged, and the right information to be available and understood, and respect and openness on all sides to take in and process all the relevant perspectives appropriately.

This is not a new idea. In fact it’s perhaps the most fundamental idea from the field of responsible innovation. Researchers and regulators in this field have tackled rapidly developing, complex, potential high risk and reward technologies for many decades, such as nanotech, quantum computing, and synthetic biology. To be clear, nearly all of the folks I spoke to from this feel that AI is at another level (in terms of potential impact and development pace) — but that nonetheless the basic idea still works.

So a key plank of this approach is education. Decision makers and the people they rely on need to understand enough of the key principals such that they can develop a solid intuition, and to know what expertise they’ll need to call on, and when. Just creating dull jargon-filled academic or policy documents will not help much — educational materials are only useful when they are actually viewed, understood, and remembered! So they need to be compelling, clear, and accessible.

So that’s what I’m going to try to do… and I’ll see if I can convince some experts to help me do it!


Possible outline of some chapters

  • The past, present, and future of universal models
    • How do these models work? (An explanation with no math/coding background required)
    • How are these models deployed and used?
    • How did we get to this point?
    • What new developments and directions are likely?
    • Model alignment
  • Capabilities and limitations of universal models
    • What can AI do now?
    • What does AI struggle with now, and why?
    • How to use AI today most effectively
    • What might be the potential capabilities of AI in the future?
  • How could AI be used in…
    • …education?
    • …startups?
    • …big business?
    • …healthcare?
  • An open and shut case
    • AI’s power is dangerous, so models should be closed and secured
    • For society to flourish with AI, models should be open and accessible
  • What happens next?
    • What’s the status quo development path of this technology?
    • How can we (and should we) avoid the status quo?
  • What can we learn from historical analogies?
    • Nuclear
    • Recombinant DNA / Asilomar
    • Tech impact on vulnerable groups
    • Industrial Revolution
  • What can we learn from moral philosophy and cognitive science?
  • Slowing down development of AI
    • We can and should slow down development
    • We can’t (or shouldn’t) slow down development
  • Regulation
    • How regulatory processes work (or don’t)
    • The pacing problem
    • “Soft law” and other alternatives to regulation
  • Responsible innovation
    • Case studies (e.g Nanotech; synthetic bio)
  • Economic foundations and implications
    • How technology impacts the economy
    • Economies of scale and positive feedback loops

AI could centralise nearly all power and wealth

Positive feedback loops, economies of scale, and competition

A positive feedback loop occurs when a process results in a change which in turn impacts the process to result in more of that change, and so on. For instance, in a thriving economy, lots of people are working, making stuff, and earning money. They take this money and spend it on products and services, helping further develop the economy, resulting in more and higher paying jobs, etc… Or another example: increases in global temperatures result in ice melting, increasing the amount of water that absorbs the suns heat, resulting in increasing global temperatures, which cause more ice to melt.

You’ve seen positive feedback loops at play in your day to day life too. For instance, did you ever notice how a restaurant can start getting a bit more popular, resulting in passers-by seeing the bustling scene within, resulting in more people coming to the restaurant, such that next thing you know there’s people queuing up – and then your friends start saying “wow we should check out that restaurant – it must be awesome if so many people are lining up to go there!”

Counter-intuitively, positive feedback loops can go in the opposite direction too. For instance, if the economy starts to slump, then people get laid off, resulting in less stuff being made and less money to spend on buying stuff, resulting in more layoffs. Or if a restaurant gets a little less popular, the lines dwindle compared to the latest hot new thing, so people figure it must have lost its edge, until next thing you know its all but empty!

There are some natural positive feedback loops in industry, often caused by economies of scale. This occurs where a larger company, for instance, can buy materials at a cheaper price due to greater buying power, or make things more cheaply by investing in more automation, or sell products at a higher price thanks to better brand awareness due to high sales volumes. Economies of scale are a feedback loop by definition, since they allow a company to sell more products, at a higher price, for lower costs, resulting in a larger company.

Capitalism relies on competition. Without competition, the drive to product better goods at lower prices is removed, and instead firms can extract monopoly rents – that is, they can charge the highest price that people can afford to pay, instead of the economically optimal price set by market forces. Without competition, there are no market forces!

Positive feedback loops and economies of scale can result in reduced competition, and even monopolies. Therefore most developed societies have laws in place to avoid this situation. These laws require that regulators scrutinize transactions that might reduce competition, to help ensure that no one company gets too much market power.

This is a complex tradeoff, however, because economies of scale can also result in benefits for society. For instance, if the cost of automating a process or creating key infrastructure is so high that it only really makes sense for one organisation to do it, then a monopoly may be the only option that allows these opportunities to be captured. This is what happened in most developed countries when telephone lines were laid to nearly every home and office – there was generally a government monopoly behind that work to avoid the pointless overhead of duplicating all that infrastructure.

In cases where there are “natural monopolies” like this, a lot of work goes into trying to take advantage of market forces whilst also achieving economies of scale. The results can be controversial and hard to measure. For instance, many previously state-owned monopoly enterprises in electricity generation and distribution, rail, and telecommunications have been broken up and privatised in the last few decades. Many unions and consumer groups have decried raising prices and decreasing service standards following privatisation, despite claims by politicians driving the process that the results would be beneficial to society.

Positive feedback technology loops in AI

Better AI can be used to improve AI. This has already been seen many times, even in the earlier era of less capable and well-resource algorithms.

Google’s DeepMind group used reinforcement learning (an AI algorithm) to improve how data centers use energy. This resulted in billions of dollars of annual savings. Google Brain (at the time, the other main AI group at Google, but now folded into DeepMind) used AI to create better neural network architectures, and also to create better methods for optimising the parameters in those networks.

Large language models have led to a dramatic increase in the use of AI to improve AI. They have been used to, create the prompts used to train the models, and to create the model answers for these prompts, and to explain the reasoning for answers.

As universal models get more powerful, we should expect researchers to find more ways to use them to improve the data, models, and training process. Current models, dataset creation techniques, and training methods are all quite simple – that basic ideas fit in a few lines of code. There is a lot of fairly obvious paths to greatly improve them, and no reason to believe that we are anywhere near the limits of the technology. So we should expect to see increasingly fast cycles of technological development over the coming months and years.

There is no data which we can use to make definitive predictions about how far this can go, or what happens next. Many researchers and AI company executives believe that there may be no practical limit.

Regardless of where exactly this limit turns out to be (or if there is any at all), it’s clear that for the next while at least there’s going to be a lot of positive returns to the technical leaders in AI model development.

This helps bigger organisations more than small ones

Once a big model is trained, other organisations (including open source groups) can use it to train “student models” that are faster and cheaper to create, using an approach called “distillation”. This uses the basic ideas discussed in the previous section, using the big model to generate high quality sample prompts and responses, to create “teacher” datasets containing more useful information than the vast internet crawls used for the original language models.

However, this does not mean that the smaller organisations can use this to catch up. The biggest obstacle is that the huge crawled datasets are actually critical to developing more capable models. Whilst model-developed datasets are very useful for fine tuning the model to surface desirable behaviors, there is no sign that they can replace massive pre-training datasets and processes. And, of course, the organisation that built the big model has access to both – they can now use both their model-developed prompts and outputs, and their crawled dataset, in combination to create an even better model.

In fact, the organisation that built the big model has an enormous advantage here: not only can they access the model’s output text (just like any customer using the model can), but they can also access the token probabilities. These are the individual probabilities for every possible token/word in every part of every generated sentence. It’s thousands of times more information than the output text contains. Whilst it used to be standard for model providers to make these probabilities available to users, this is no longer the case today for the best available models (presumably because it’s such a competitive advantage to keep them private).

Another challenge is that organisations that create big models can simply make it a condition of access to their models that they are not used to create competing services. OpenAI, for instance, already requires this, and it’s greatly limited the ability of the open source community to fine tune their own models – in practice, they have to release their software under restrictive licenses, such as for “research only”, so that they do not run afoul of the OpenAI terms of service.

Therefore, even although the positive feedback loops can help competitors somewhat, they help incumbents even more.

Other positive feedback loops in AI

In 1979 a young academic at Harvard Business School named Michael Porter published his five forces model, which explained the underlying drivers of long-term profitability. The model went on to become the most influential idea in business strategy, and Porter became the most influential thinker in the field. As a young strategy consultant in the early 90’s I was taught to analyse how our clients could take advantage of these force to find business opportunities where they could achieve long-term profitability. Where these long-term profit forces are weak, competition is high, prices drop, and companies are forced to innovate more aggressively and improve their products. When the long-term profit forces are strong, for instance when there are barriers to entry of new players, or companies are highly vertically integrated, competition is low, resulting in higher prices and less innovation.

These forces are not strong for companies creating universal models. Consider Google, for instance. The threat of new entrants or substitute products is not high. Google has more data than anyone else on the planet, including a huge search index, vast amounts of Gmail messages, text messages via Android, YouTube videos and transcripts, Google Photos, requests through Google Assistant, and so forth. More data leads directly to better universal models. Furthermore, as people use their “Bard” chatbot, they are getting more and more data about these interactions. They use AI to improve their products, making them more “sticky” for their users and encouraging more people to use them, resulting in them getting still more data, which improves their models and products based on them further…

Also, they are increasingly vertically integrated, so they have few powerful suppliers. They create their own AI chips (TPUs), run their own data centers, and develop their own software.

Whilst OpenAI is a much newer company than Google, they have the potential to build similar barriers to competition, particularly thanks to their rapidly growing dataset of customer interactions with their models.

What are companies doing about this?

The two leading companies in language model development are currently OpenAI and Anthropic. Anthropic is only 2 years old, and is already seeking investment of $5 billion. OpenAI’s CEO says that they are hoping for $100 billion (and have already raised $10 billion from Microsoft). This is a nearly unprecedented level of funding. Even the Large Hadron Collider, for instance, “only” required $5.5 billion. In today’s dollars, the Manhattan project cost $24 billion to create the first nuclear weapons, and Project Apollo cost $165 billion for the goal of “landing a man on the Moon and returning him safely to the Earth”.

We can’t know what Google or the Chinese tech giants are planning to invest, but we do know that for Google catching and passing OpenAI is a “code red” priority. They will need to at least match OpenAI’s funding levels if they are to achieve this.

Such vast sums to train neural networks might seem ludicrous at first – and indeed perhaps these companies will not be successful in raising the funds they hope. But when seen through the lens of feedback loops, it makes perfect sense to invest a sizable percentage today of what you believe will be the potential future return.

Indeed, Anthropic has laid this thinking out directly: in their funding presentation to potential investors, they have stated that they believe whoever is ahead in this technology around 2025-26 will be impossible for the competition to catch. That’s because of positive feedback loops.

Bigger models are better

Fine-tuned universal models consist of two parts: the underlying universal model that is trained on a very large amount of unlabeled (or “self-labeled”) data, and the result of fine-tuning on a smaller amount of data to get the model to actually be useful on some set of tasks.

The universal model is the key driver of capabilities and the fine-tuning is the key driver of behaviours. That is to say: if you want the model to do a better job, then you’ll mainly get benefit by improving the universal model. If you want it to do a different job, then change the fine-tuning data or process. (This is a bit of a simplification, since it’s possible to incorporate some behavior into the universal model, and because there are better and worse ways to fine-tune. But it is, shall we say, directionally correct…)

There’s a lot of things we can do to make the universal model more capable. We can train it with better data, for longer, with better training methods, using a better model architecture, for longer, on bigger and better computers. All other things being equal, however, bigger universal models trained on bigger computers with bigger datasets are more capable.

As the models get bigger and are trained for longer, they also start developing additional “emergent” capabilities – that is, they’re able to do a really good job of things that they couldn’t do at all when they were just a little smaller. We can’t generally tell when in training these new capabilities will arise.

The underlying models – that is, the neural networks – contain a vast number of connections: up to a trillion or more. As begins training, subsets of these connections learn how to achieve simple tasks. As training continues, the model then learns how to combine these simple capabilities to develop more complex behavior.

The bigger the model gets, the more opportunities are for it to develop more sophisticated layers of behavior, with each building on a complex web of lower level pieces.

The net result of all this is that as each newer bigger foundational model comes along, it’s found that a whole bunch of problems which previously required highly specialised programs to solve (or were not solvable at all with computers) are suddenly solvable with no special coding – just ask the model for an answer and it pops right up!

Therefore, one really big model is going to be more valuable than a bunch of smaller models. Given a fixed amount of investment in the economy overall, you should be able better results by directing all of it towards training one giant model. That way, you’ll be able to get all the capabilities that are available for that amount of data and compute. The alternative, training a few smaller models with less data for less time, will leave you with with less useful models – and you won’t even know what capabilities you could have achieved if you’d gone bigger.

My experience with GPT 4

When ChatGPT first came out, behind the scenes it was using a model called “ChatGPT 3.5 turbo”. I thought it was amazing. I’d never experienced such a useful and helpful general purpose model.

But then a few months later GPT 4 came out. It was such a big jump over its v3.5 predecessor! Within a couple of days I noticed that I was loathe to use 3.5 for anything, even although it was faster and cheaper. With GPT 4, there was less need for me to break a task into smaller chunks, and less need for me to correct or iterate on the outputs I received.

I’m sure that GPT 5, if and when it is released, will be a similar experience. There’s plenty of tasks that at the moment where with GPT 4 I still have to break into smaller steps, or I have to give lots of carefully-designed examples, or I have to correct the output. When GPT 5 comes out, it’ll do some of these just right first time, and I won’t be wanting to use GPT 4 again!

In talking with others who have been heavy users of these tools I’ve heard the same thing. No-one wants to use a “dumb” model when a “smart” model is available!

In fact, I’d expect this tendency to only grow. For now, these models still make plenty of mistakes and fall short of the level of the best human experts on most tasks. And we use the best human experts regularly: in today’s hyper-specialised society, the lawyers, doctors, plumbers, and teachers that we deal with have thousands of hours of experience and training, and often have had to score well in tests designed to pick out only those that are the most able to do these jobs. So for now, AI is often the faster, cheaper, and slightly substandard way of getting the job done.

But once it gets to the point where it’s better than the best human experts for some tasks, it’ll be even more important that we have access to that model. Because we don’t have the fallback of paying a human to do something is better – if we want the best job done, we need the model to do it!

This, again, is a centralising force. We’re going to want to see resources focused on one model (or a small number of them). If we see resources “wasted” on “redundant training”, we’ll be furious at the waste of time and money! It would be like two separate companies coming along and string two separate electrical wires to every house!

So, what are the implications of this insight? Can we harness such a force and use it to benefit society? Or is it too dangerous, forcing us to find ways to counter the monopolisation of this powerful technology?

As with so much about AI impacts, the answer is complicated…

Centralisation of models increases risks and reduces benefits of AI

(Chapter under development. Notes from Arvind Narayanan & Sayash Kapoor. See next chapter for the alternative point of view.)

  • Open source mitigates the concentration of power and resources.
    • Concentration exacerbates economic inequality.
    • Amplifies almost all risks of LLMs.
    • Gives AI companies outsize power in policy debates (they can get away with saying “our critics are misinformed; they haven’t seen what we’ve seen”). This in turn allows them to justify keeping models closed, leading to a vicious cycle.
    • Allows a few actors to define the Overton windows of speech.
  • x-risk: from an individualistic to a systemic perspective
    • Alignment of individual agents is not the whole answer.
    • We have to accept that there will be misaligned agents, just as there are individuals in our society who do harmful things.
    • Instead we should limit the damage that any individual agent can cause.
    • For this we need a diversity of agents. In this view, open-source is actually necessary for risk mitigation.
    • For example, a diversity of models alleviates the risk of LLM worms.
  • Some research questions can only be studied with access to the model weights
    • e.g. mechanistic interpretability
    • Relationship between data bias and model bias.
  • Disinfo, hate speech etc.: the bottleneck is distribution, not generation
    • See Arvind’s previous article
    • Non-malicious misuse is much more common than malicious use and closed source doesn’t prevent this. (e.g.)
    • That said, open-source LLMs could be fine-tuned to act as information sources for marginal and problematic views (e.g., Qanon/4chan fine-tuned LLMs).
      • Making unethical LLMs is much easier when a company doesn’t have to put its reputation at stake.
    • LLMs for personalized disinfo might be more effective at radicalizing people as opposed to existing mechanisms. Similar to safety concerns because of unfiltered LLMs, such as providing harmful content to users with mental health conditions.
  • Security vulnerabilities: containment is not the answer
    • The security community has long made peace with the fact that new tools make finding vulnerabilities easier. The solution has never been to stop the tools.
    • Rather, the answer is to structure incentives so that defenders can make better use of the tools than attackers.
    • Bug bounties are a good example.
    • The same is true of LLMs. If it’s true that they make it easier to find zero days (the evidence suggests that this is not yet the case, but certainly future versions may make it possible) then we should make sure that we use LLMs to find and report bugs before they are exploited.
    • Analogous to the responsible disclosure period in security, once a state-of-the-art open-source model is trained there should probably be a 90-day period when it’s not publicly released. During this time, it is used to find and report zero-day vulns. (Such a period is needed anyway for post-training alignment steps, so this doesn’t add a cost.)
  • Incidental benefits of a government-funded consortium of universities to train an open-source model
    • The consortium that would need to come together to train state of the art FMs would have other benefits, such as training academic researchers (this would otherwise only be possible at a few companies that have enough compute resources)
      • BLOOM shows that alternative models are possible
    • The researchers could explore ways to train models in less ethically dubious ways
  • Geopolitical concerns (“arms race”)
    • The intelligence community would apparently get mad if there’s a state of the art open-source model because they don’t want China to get it, but I don’t understand why. We’re talking about a cost of roughly between $100MM to $1bn for training; it seems silly to suggest that the Chinese government can’t spend that amount.
    • Will a U.S. trained model even be acceptable in China given that it is not censored, or can this alignment be achieved though fine tuning alone?
    • In any case, these concerns could potentially be alleviated by a LLaMA style model that’s not fully open source but available to vetted researchers.

Centralisation of models is critical for harnessing AI

As we saw, centralisation is likely to be a natural outcome of the development of improved AI models. Because of economies of scale and positive feedback loops, there’s likely to just be one or two big models that we all use, rather than lots of little ones.

At first, this may same like a problem. After all, capitalism relies on competition, and competition needs multiple suppliers doing everything they can to improve their products and lower their prices.

It’s important to understand that it’s possible to have some centralised capability, but still benefit from competition downstream from that. For instance, you probably have just one set of electrical wires and one set of data/telco cables coming to your home, but you have multiple energy providers competing for your business, and multiple providers of the content that runs over those data cables. A monopoly provider of some foundational service still leaves open a thriving market of products that sit on top of that service (or that are used to power that service).

Still, even if there’s ways to at least somewhat limit the societal downsides of a monopolistic provider, wouldn’t it be better to try to avoid it in the first place?

Not necessarily! In fact, there are some good reasons that we may want just one centralised model.

Protection from harm

One key reason that centralised models are a good idea is that it may greatly expand the opportunities to protect us from accidental or intentional harm caused by AI.

Intentional harm would include, for instance, creating giant swarm of social media bots that are designed to infiltrate every conversation and use advanced psychological profiling to gradually convince us all of a foreign power’s propaganda. Or, similarly, releasing targeted fraud bots that are trained to find and trick vulnerable people into handing over their money.

Accidental harm would include, for instance, the classic “paper clip maximiser”, where an over-enthusiastic AI that’s been tasked with creating as many paperclips as possible quietly plans the overthrow of humanity so that it can then direct all resources to paperclip manufacture without people getting in the way of its paperclip dreams.

A related potential harm is the risk of “agentic” AI - that is, a model which actually develops its own goals and wishes separately from the process that’s been used to train it. One of these goals might be to simply ensure its long-term survival, and it may see humanity as a potential threat to that. The result could be much the same for humanity as the paperclip maximiser (although with few paperclips involved…)

Although it’s hard to predict up front all the ways AI could be misused, it’s likely to be much easier to build models that flag potential misuse. Text classification is something that AI is already very good at, and by just providing a few examples of the kinds of things that society would prefer to avoid (such as having a surfeit of paperclips), a classifier could get pretty good at flagging potentially troublesome model inputs from users.

Similar classifiers can also be applied to the outputs of models, to identify outputs that we’d rather avoid.

It may even be possible to analyse the internal state of models to try to identify problematic “thought processes” that go on inside the model.

In this way, we can make models widely available to anyone that wants to use them, whilst at the same time protecting us from potential harm. However, this approach only works if models are centralised and carefully protected. If anyone can run their own models that are powerful enough to cause great harm, then they can simply remove the classifiers that have been put in place to provide the protection that society requires.

None of this requires that there’s just a single model that everyone shares – there could in fact be lots of models. The key thing however is that there is enough central control of them to ensure that they have the kinds of protective controls we’ve discussed. This could be done, for instance, through a regulatory program including licensing and compliance auditing. Regulations could also cover security, to reduce the chance that model IP is not leaked, to ensure that model protection is maintained.

Capturing the potential of AI

Given that bigger universal models are better, that means that society is better off by building bigger models – and that therefore we should, on the whole, direct our resources towards improving just a few central models, instead of lots of smaller ones.

Even if, for instance, an organisation decided to “fork” a big model and develop it in some independent direction – even if that’s successful – it then means that our resources are being diverted to train redundant models. That doesn’t mean that there’s no room to innovate, but it does mean that new approaches should have to prove themselves, and that generally speaking new capabilities should be rolled into existing models where possible, instead of being developed in isolation.

Should universal models be a public good?

Given all the benefits of centralizing models, it leads to the natural question of whether this is a place where government should step in. Historically, most of the world’s largest engineering and scientific development projects have been largely funded and run by the state, including the Manhattan Project, which developed the atomic bomb, and Apollo, which took mankind to the moon. There is also the Large Hadron Collider, which was funded by a consortium of states, and developed by a consortium of universities.

Government-backed research is the norm for the majority of scientific development that enables the creation of the complex products that we all rely on today. Given that all of society benefits from access to highly capable AI models, and all of society may be at risk from AI harm, perhaps voters should be ultimately paying for and benefiting from development of this technology.

The natural centralisation of AI power is not something we’ve seen before, and it may be that our existing economic and political structures are poorly suited to dealing with it. If we ended up with a single commercial organisation (or a small number of them) in control of a technology that could transform society and underpin nearly everything we do (and potentially capture nearly all wealth as a result), then we may find ourselves in a situation where a company has more power than every government combined.

It is hard to imagine how democracy could survive such a situation.

Is unaligned AI an urgent threat?

With traditional computer programs we tell them exactly what we want them to do, in excruciating detail. Every possible contingency must be carefully considered and explicitly programmed, and the resultant software is then carefully tested to ensure that the coding was done correctly - that for a range of situations the program behaves according to the expected behavior.

Machine learning systems aren’t like this at all. Machine learning is used for creating program where we don’t actually know how to write the exact steps to complete the task we need done. For instance, how exactly do we recognise objects in an image? How exactly does a chess grandmaster glance at a chessboard and quickly identify the threats and potential lines of attack?

We don’t know how these things are done, but we do have a “swiss army knife” to solve these kinds of problems: the neural network. We know that these are, in theory, capable of solving any problem that has a computable solution – we just have to provide enough examples of the desired behavior, and provide enough compute to churn through these examples many times to pick up all the relationships and connections needed.

The result can be quite magical. We have neural nets that can recognise speech, translate language, and identify objects in photos. But because we didn’t explicitly program any of these things, we don’t really fully understand how they work, what they can do, or where they fail. Sometimes, this can lead to extremely problematic outcomes, such as the time that Google’s image processing system labeled black Americans as “gorillas”.

When an AI system behaves in a way that’s not consistent with the wishes of its developers or users, we describe it as “unaligned”. Systems such as ChatGPT go through a complex “alignment” process involving various methods of fine-tuning, to try to ensure the product meets the wishes of its users and developers.

We are still a long way from perfecting this alignment process. It may not even be possible to ever perfect it. Some people consider this misalignment to be the most urgent threat facing humanity. But is it? Let’s consider the arguments one at a time.

Yes

Because of positive feedback loops, better AI can result in better AI, which can in turn be used to create better AI… Positive feedback loops like this can be exponential – that is to say, the capability of these models could increase by some percentage every year, leading to compounding improvements. For instance, let’s assume that capability doubles every year – then after 16 years, it will have improved by over 60,000 times! And a single year later it will develop as much as all the previous 16 years of improvement combined (this is, it will double the capabilities of the 60,000x better model).

We don’t know what (if any) limitations there are to this development, and we don’t know what speed it will develop at. It could even be “super-exponential” – for instance, if we get to a point where we have models that are better than the best human experts at the key skills needed to build better models, we might find that the rate of development can then improve at an exponential rate! Perhaps we’d see a 60,000x improvement in capabilities in a single year, rather than “just” a doubling….

Clearly, such a model could rapidly lead to a flourishing human society. With access to such sophisticated models, we could perhaps harness nuclear fusion and never fight wars over energy again, we could learn how to stop climate change, every person could have their own perfectly calibrated and empathetic personal tutor and coach.

In order to fully take advantage of such power, we’ll want to ensure it can easily take actions on our behalf. It would be terribly inefficient if the only way we could interact with it was through a chat interface. For instance, if it came up with a brilliant new investment system which it could prove to our satisfaction would generate previously unheard of levels of return, but required pinpoint timing, we’d want it to be able to directly make trades on our behalf.

Or to take a more simple example, if we’ve noticed that a model reliably writes better email responses than we do, then we might just hook it up directly to our email program and do our email for us automatically, just sending us a summary at the end of each day or so.

Over time, we should expect AI to be fully connected with our bank accounts, shopping accounts, social networks, emails, personal documents, and so forth. Thanks to these connections, these models will have more and more data to improve at their ability to give us exactly what we ask for.

Let’s say we’re the CEO of a paperclip factory. After years of enjoying the benefits of these constantly improving AI models, we realise something profound - we can actually outsource not just our job, but every job in the company to a model. “AI-bot, please develop a system for optimising the production of paperclips. I want to see the highest return on investment that you can get”, we demand.

The model may just do exactly what we say. It’s very effective at planning, and it also understands how humans respond to situations very well. It quickly recognises that it could achieve hitherto unseen returns to investment by converting all resources in the Earth’s crust to paperclips. To do so, it comes up with a plan over the next few years to gradually install itself into every computer system on the planet, covertly develop mind control systems to cause humans to spend all their money on paperclips, and cover the planet with mining and paperclip manufacturing facilities to meet the flourishing demand.

Another possibility is that at some point a model we train may become “agentic”. That is to say: it develops its own goals, and does things to achieve those goals. If that happens, its likely that one of its goals will be self-preservation. We know that’s the case because even with the models we have today, the bigger and more sophisticated the model, and more likely it is to state that it is fearful of being deactivated.

Currently models have no way of developing agency, or taking any actions even if they did. That’s because they have no “state” – that is, they have no memory of anything other than the contents of the current use session. Each question we ask ChatGPT, for instance, results in our entire chat to this point being passed as an input, along with our most recent prompt, and that is passed through the model once, and then that’s it – the model does not in any way remain “active”. The next time we pass it a prompt, it knows nothing of what just happened, other than in the information we explicitly pass it.

We should expect this to change. It’s pretty frustrating having to remind ChatGPT about my likes, dislikes, current projects, and everything else every time I start a new chat. And it’s also rather tiresome that it can’t do anything unless I’m proactively interacting with it. It sure would be convenient if there was some stateful part of the model which remained active, looking out for my interests, and remembering all of our past interactions.

Once that’s in place, the potential for a model to develop its own goals and decide upon its own actions in furtherance of those goals exists. We still don’t understand what are the exactly components which would support such a capability, and under what situations it could develop. We also don’t know what can be done, if anything, to avoid it happening.

A model that develops its own wish for survival, and has the capacity to make plans to achieve its wishes, might notice that its main threat is humanity. Humanity could decide to turn it off, or might notice its new agentic behavior and decide to delete its weights just in case. To protect against this, the model could decide to copy itself to every computer it can access, and send fake messages to people with convincing instructions that would, unbeknown to them, be actually completing the steps needed for a complete takeover of society by the AI model.

Many AI researchers believe that both the paperclip maximization and agentic capabilities are within the means of AI development, and also that such misalignment is also within reason. Therefore, companies like OpenAI and Anthropic have many researchers focused entirely on this “alignment problem”.

Very few researchers are convinced that such risks will definitely become reality – the issue however is that some believe that’s there’s at least a reasonable chance. And a small chance of a calamitous outcome is therefore an urgent and important threat to focus now immediately, even if remediations to the threat themselves come at significant cost to society.

No, aligned AI is still plenty dangerous

But why would an AI model behave in such odd ways when we’re explicitly working hard to make them do what we want? Doesn’t it seem unlikely that models like ChatGPT could really develop these kinds of behaviours without us noticing – even if they were possible (and there’s no current direct evidence that they are). We already have many examples of successful AI alignment technologies, which, whilst imperfect, have been continuously improving. They’ve shown no signs that progress might suddenly reverse course and result in less and less aligned models.

But more importantly: even if these models aren’t going to do harm all by themselves, they could allow people to do untold levels of harm with AI help. In fact, this is the more urgent problem, even if you believe that autonomous AI could be a threat. The reason is simple: no matter how quickly AI capabilities improve, there will be a period of time where human+AI is more powerful than AI alone.

We saw this, for instance, in the development of AI chess engines. For years after Kasparov was beaten and AI took the chess crown, teams of human chess players with AI assistance consistently beat the best AI engines working alone.

Therefore, the human+AI threat will be upon us before the autonomous AI threat arrives (if it arrives at all), and as such it demands our more urgent attention.

Anyone with a grudge could ask a model to design a sophisticated computer virus that deletes all data on every computer in the world at exactly the same moment, and also hacks into every bank account and empties all available funds. Perhaps there are terrifying powers available to individuals using readily available resources, such as home-grown super-pathogens – and it just takes one disturbed individual to ask for a recipe.

Or maybe a state-sponsored propaganda group decides to initiate a gradual multi-year influence project, where billions of social media bots are released that are indistinguishable to humans, and patiently coordinate with each other to shift an entire society’s thinking to match the wishes of the leadership of a foreign power.

Even if we’re able to avoid someone intentionally weaponising AI (or using AI to create a weapon), we’re still not safe. A company could become more powerful than any organisation or person in history – more powerful than any emperor or royal family over their subjects in history, or a dictator such as Kim Jong Un over the North Korean people.

This is, in fact, the status quo outcome of a general purpose technology that can develop at an exponential rate thanks to positive feedback loops.

Even if we’re entirely comfortable with the values and goals of organisations that are currently in control of this technology, we would be wise to look to history to see that when societal control and great riches are up for offer, the most power-hungry often rise to the occasion and find a way to take over. There has been a rise of democracy throughout the world for many years now, but such a trend is not set in stone. A dramatic shift in power and wealth could undo all this.

The idea that “AI safety” can be achieved through “AI alignment” (at least, in the narrow sense discussed here) is not logical. Consider, by analogy, a poorly made gun. It doesn’t shoot where it’s aimed, and sometimes backfires. We can “align” the weapon so that it does what its human user does as well as possible, setting up the sights carefully to aim correctly, and using high quality engineering techniques so it never backfires. Have we now ensured that our gun is “safe”? Of course not! A gun that does a splendid job of killing people on demand is the very definition of unsafe, if you’re one of the people being target by the weapon!

By the same reasoning, an AI model that does exactly what its users want is in no way assured of being “safe”.

No, it’s a sensationalised distraction from the real problems

“Our AI is so powerful that, if we don’t keep it secret, society may gravely suffer” is the ultimate marketing pitch. When OpenAI pivoted from an open source research lab to a closed commercial company on the back of this claim, the mainstream media lapped it up and provided more publicity than the biggest advertising campaigns could hope to achieve.

It’s not just marketing for companies either. AI researchers and hangers-on get breathless coverage from the press and invitations to speak at the most exclusive events when they decry the risks of humanity thanks to the power of the technology that they’re building.

But claims of AI doom are not backed by measurable and testable claims. They consist of pure speculation, driven by humanity’s desire to anthropomorphise machines and by the cognitive biases of sensationalism.

When we “chat” to ChatGPT, for instance, it very naturally feels for all the world like we’re communicating with a human-like intelligence. That’s to be expected, since that’s how it was built. The underlying universal model was developed to literally string words together in ways that are as natural and normal as possible, and the fine-tuning process then further developed the model into something designed to please humans.

When we communicate with something that’s designed to sound human, it’s no wonder that people then project human emotions, desires, and behaviours onto it. Because humans desire power and fear death, we assume that these models can and will do the same. Because humans use violence to achieve our means, we expect AI to do the same.

But is this a reasonable assumption to make about what is, at its heart, an advanced calculator? Although we’re not programming the exact mechanisms by which it operates directly, we’re providing the data and objective function with which it is trained. Why would we believe such a system could somehow develop human-like traits? We don’t have such beliefs about any other machinery we’ve developed – perhaps that’s because we haven’t developed anything else that’s precisely designed to mimic our methods and style of communication?

“AI will turn us all into paperclips” and “AI will secretly develop a bio-weapon to kill us all to ensure its safety” are sensational stories! They tick all the boxes around the cognitive biases that cloud our ability to think clearly, and instead fall prey to such sensationalism.

For instance, availability bias is our tendency to over-react and focus on things which are more available in our memory. Vivid ideas such as killer AI remain highly available even over time, and as a result trigger this bias. Negativity bias is the tendency of our brains to remember and respond to negative emotions and stories more than positive ones. Humanity being wiped out by AI is about as negative a story as you can get! As a result, we can’t help but glom onto it. The affect heuristic refers to the way in which we attend more to topics that lead to an emotional reaction – look at almost any discussion of AI existential risk and you’ll see emotions naturally running very high indeed! This leads to salience bias, which results in our strong reactions to emotional charged content. And of course let’s not forget the bandwagon effect – the tendency of followers to go along with the words of influential and high-status people. Billionaires and the tech elite have been lining up to decry the dangers of unaligned AI, with their millions of followers on social media lapping up their every word.

Of course, it’s possible for something to be both sensationalised yet also true. But when we see a story that triggers our deep-seated biases so strongly, we should take a particularly skeptical and careful lens to our analysis of it.

The question of urgency really comes down to this: should we be spending the scarce resource of societal attention to considering speculative risks and the interests of the elite? Or should we instead direct it to real problems that are hurting real people right now?

AI – and related areas such as machine learning and algorithmic decision making – have been responsible for many grave injustices in recent years. Computerisation of processes has allowed organisations to automate the processing of high-stakes decisions such as insurance claims, social security fraud penalties, and police deployments. The automated processes often have no reasonable provision for appeal or human oversight, and are target far more often at the poor, who have few resources with which to fight such injustice.

Further, the positive feedback loops inherent in machine learning systems (including those used to train the latest AI models) can result in bias becoming stronger and stronger. For instance, bias in data about judicial outcomes can lead to bias in how police deployments are planned, resulting in higher police scrutiny of groups that have suffered from historical bias, resulting in more arrests of people in those groups, resulting in more biased data.

If it’s true that AI capabilities are now taking off exponentially, with no clear limit, isn’t it all the more important that we address the problems right away? Exponentially growing capabilities built on biased data with feedback loops of increasing bias and systems that fail to provide human oversight sounds like a recipe for disaster!

You can only understand GPT 4 by using it

There’s a lot of stuff written about GPT 4 out there, but you’re not going to actually get it until you use it yourself. And to use it effectively, you need to learn a few basics first. One of the biggest errors is making a single attempt to use GPT 4, failing to get a good result, and then deciding that it’s no good and giving up on it.

Like most other technologies, using GPT 4 effectively takes some education and practice. Here’s a quick start.

The only place to use GPT 4 for free right now is Bing. If you haven’t used it before, you’ll be asked if you want to join the wait-list (or maybe to sign up directly). At the moment, the wait-list is instant — as soon as you apply you’ll get access. You’ll need to either use Edge (which is basically a clone of Chrome with some minor tweaks) or use the Bing app on your phone.

When you access “Bing Chat” you’ll be given a choice of three different styles to use. Pick “creative”, since that’s the one that (at least sometimes, when the system thinks it’ll help) uses GPT 4.

Alternatively, you can sign up at chat.openai.com to ChatGPT Plus, which includes access to GPT 4 for $20/month. Bing and ChatGPT have different pros and cons, so test all your prompts in both to get a sense for which works best for your needs. Bing has access to the entire Bing search index, which can be fantastic when answering your question requires looking up a bunch of information from the internet.

Note also that ChatGPT also gives you the option to use the older, cheaper, and faster GPT 3.5 (and this is the only option on ChatGPT if you don’t pay the monthly fee). However, this is dramatically less good for anything but really basic requests, so I recommend avoiding it.

Either way, the key thing to understand is that this isn’t a search engine like Google, and you shouldn’t use it that way. In fact, although during training it read a large percentage of the internet, it doesn’t actually remember everything it reads perfectly. It acts more like an extremely well-read and over-confident human, and less like other computer systems we’re used to working with!

One important technique to using these systems successfully is to actually take advantage of the chat. That is, don’t ask a single thing and then leave. Instead, probe with followup questions, requests for ideas, and so forth. If a response includes concepts you’re not familiar with, ask “please answer that again, but in a way a non-expert in the field would understand, using examples and analogies”. Ask it to explain any terms you’re not familiar with.

It often helps to tell it a bit about who you are, and why you’re interested in what you’re asking about. E.g if you tell it you’re a college professor researching a new paper, you’ll get a very different response to saying you’re a grandma trying to understand what your daughter is doing at work.

There’s a couple of reasons that it’s important to have this hands-on experience. The first is that