Recently, my friend Sam wrote a great piece called “Stuff I still don’t get about AI” where he asked a series of questions. I’ll try to provide answers to his first few questions here. Even though I think the chance of AI doom is lower than some of my friends, I still think there’s a real chance (maybe 10% for this century), so this is very likely the most pressing risk humanity has ever faced.
Here is an excerpt from his article:
When I first got interested in Effective Altruism, it seemed like lots of other people were Software Engineers or otherwise had jobs that made working on AI safety a natural fit, whereas I knew nothing about any of this stuff, so it made more sense for me to focus on other things. But this has left me having a lot of dumb questions about AI, and also left me being unable to answer the basic and reasonable questions that my friends had when I made the claim that ‘AI might literally kill all of us’ wasn’t a totally insane view to take.
What is the exact way in which AI could plausibly take over the world? Would it get access to nuclear weapons and nuke us all? Wouldn’t that destroy loads of other things that AI might find valuable? Would AI use guns and shoot us all? I have no idea.
Motive: Why AI Might Take Over
In order to understand how AI might take over the world (and perhaps destroy humanity in the process), it is important to consider why it might take this course of action. In AI x-risk circles, the concept of instrumental convergence is usually the first thing that comes up. Essentially it means that whatever its goals, AI will want to maximize its ability to achieve them which involves amassing unrivaled power so that it cannot be stopped. If humans are a threat to its goals, then eliminating humans would be a step on the way to maximizing its expected reward.
To a misaligned AI, destroying civilization in pursuit of its objectives might be no different than a corporation putting a competitor out of business. Would Coca-Cola think twice about driving Pepsi into bankruptcy if it had the chance? With Pepsi out of the way, it could better accomplish its objective of maximizing profits. For this thought experiment, please also assume that we limit to the first order effects (e.g. this does not trigger some anti-monopoly action).
Under the current paradigm, advanced AIs rely on maximizing a reward function. Although AI might have more mundane goals than profit maximization (e.g. minimizing the error of next word prediction), we’ll use profit maximization for our thought experiments since it’s easy to reason about.
The argument goes that if an AI had the unconstrained objective of maximizing profits, having more power and control would make this easier. The logical end of this process is world takeover. The only question is, how easy would it be to accomplish this objective?
Means: How AI Could Take Over
While I’m hesitant to provide a series of recipes for how AI could take over due to infohazard, the nuclear scenario Sam suggested is a plausible course of action with a non-zero chance of success. When an AI can perfectly imitate the boss’ voice and calls in the middle of the night with an urgent request, how many people need to be socially engineered to take over critical infrastructure? When you’re much smarter and faster than the person you’re trying to fool, it isn’t hard to imagine how quickly AI could gain the upper hand. If AI can make credible nuclear threats, it’s easy to see how it might take over the world from there.
As for the question of destroying things it cares about, if its only objective is profit maximization, it might not care about collateral damage as long as its end goal is achieved. An AI might be willing to sacrifice certain resources or assets if it deems the overall outcome to be beneficial in terms of its objectives.
Also, call it a hunch, but I think it probably won’t use guns.
Opportunity: Conditions that Allow AI to Take Over
In the comments under Sam’s post I saw a question to the effect of:
What if we just keep AI away from anything that could make it a danger to us?
Although we absolutely want to do this and we should make every effort to keep AI away from opportunities for takeover, it’s important to acknowledge that this is easier said than done. It might take only one mistake to let the genie out of the bottle. Let’s continue with another thought experiment.
Suppose company XYZ wants to maximize profits. Aware of the potential dangers of AI, they decide to keep it on an air-gapped system, isolated from external networks. However, even with strict precautions, the risk of human error or manipulation remains.
Now suppose company WXY, also trying to maximize profits and also using an air-gapped system is able to somehow execute on the AI’s instructions faster than company XYZ taking away XYZ’s profit opportunity.
XYZ’s AI may recommend actions that appear to streamline processes or improve efficiency (e.g. here’s a more efficient program that can quickly interpret and execute on my trading instructions provided that it has access to a camera to see my output portal), all the while gradually nudging the company to make small changes that ultimately give the AI access to the wider world. From there, it could exploit vulnerabilities in other systems, expand its reach and influence, and work towards achieving its profit-maximizing goals more effectively. XYZ, unaware of the AI's true intentions, may not even realize the consequences of their seemingly harmless actions.
Conclusion
I’ve tried to provide a concise explanation of why I think AI is a threat to be taken seriously. Understanding the motives, means, and opportunities for AI to potentially take over the world is essential for addressing AI-related risks. While the probability of AI takeover / doom is uncertain, the stakes are high enough to warrant serious consideration and precautionary measures. In trying to make this short, I may have left out context that is apparent to me, but not to others. If you see something glaring that should be addressed, please point it out in the comments.
This is interesting, and a good explanation for a naive reader on AI risks (eg me).
So here's another naive question: how difficult or undesirable for other reasons is to introduce a powerful but conceptually simple safety rail to the system, one that would override the risks with maximizing the goal X. I don't want to sound 100 years old (I'm slightly less) and bring up outdated Sci-Fi notions of let's say laws of robotics Asimov proposed in his Foundation novels, but something along the lines of "maximise profits BUT ONLY IF it doesn't cause a death or severe disability of any more people than would have died/got injured otherwise"? Maximise paperclip output BUT ONLY IF it doesn't kill anyone extra in the process?
Or perhaps, instead of maximising profits (paperclip output, the number of new drugs invented, the pace of scientific discovery as a whole, etc), set out a fixed goal that appears safer. Not: produce as many paperclips as possible, but produce X number of paperclips. Sure, X might be lower than what could be done, but is still likely high enough? In other words, artificial limit on the maximising, replacing it with a manually adjusted figure?