Why AI X-Risk Gets Overestimated

Most AI X-Riskers have only ever encountered strawman skeptics

May 16, 2023

Some of the smartest and most thoughtful people I know are convinced that AI is very likely to bring about the demise of human civilization. I think one reason they might be overestimating the risk is because it’s relatively rare to find anyone who actually understands the problem well enough to push back against it. On top of this many of the people who understand the arguments and have relatively low estimates, believe that even at the relatively minor risk they perceive, the issue still isn’t getting enough attention so maybe it’s good that there are people out there making a stir.

Unfortunately, the overwhelming majority of skeptics are dreadfully unaware of the problem and quickly go to the popular “counterarguments” which are usually different versions of the same tired naive solutions (e.g. Why don’t we just unplug it? You’re reading too much sci-fi! AI is impossible / a long way away and we don’t need to worry about it! What’s it going to do? Flip my zeros to ones?). In a previous post, I tried to steel man the arguments for AI existential risk. If you think I left something out, or I’m not understanding something, please clue me in. Otherwise, if you’re satisfied that I’m correctly understanding the problem, here are my reasons for suspecting that AI x-risk gets overestimated.

AI Takeover Requires a Confluence of Conditions

In previous posts, I’ve described how AI takeover will require AI to have means, motive, and opportunity. Critically, these conditions must all be present simultaneously and within the same agent (or team of agents) or else takeover will not happen. While, disjunctively, I think the chances of each of these happening at some point in the next century are high, the odds that all three will occur at the same time within the same agent is necessarily much lower (assuming each of these probabilities are not almost 100%). Further complexity arises from the fact that these conditions are not entirely independent of each other, though that's a discussion for another time.

In response to the off-ramps raised below, I can also raise counter-examples and complications that potentially nullify the objection, but, the point is, unless you are 100% certain that the objection is invalid, every issue represents some probability of escape and there are enough of them that I think the true probability of a catastrophic outcome is frequently overestimated.

Why AI Might Lack the Motive

In the majority of discussions, the prevailing narrative asserts that an AI, to accomplish a specific objective, would likely seek to augment its own power and influence as the most effective strategy. This concept is commonly referred to as instrumental convergence. However, upon closer examination, this motive may not be as unassailable as it looks at first glance.

Agency
Outside of training, the concept of reward quickly loses relevance for AI systems. While it’s true that advanced AIs might develop a sense of purpose or agency, there is a possibility that they could function more like "baked cookies" – highly advanced tools that, once deployed, perform according to their training, but no longer seek further optimization and lack any real sense of self-determination. For instance, it seems that ChatGPT isn't actively trying to improve anything, but rather applying its training to predict the best words. When it’s wrong, it doesn’t really even have a direct feedback mechanism to find out, so for all practical intents and purposes, it doesn’t “care” whether its answers are “correct” or whether they could be further improved. As far as I’m aware, the only mechanism it has for improvement is if developers deploy a new version. Although this idea may seem controversial, it’s worth considering the implications of AI systems that possess a limited or non-existent sense of agency.

Self-Preservation
When discussing AI existential risks, our instincts, which often subtly backdoor anthropomorphism, can be misleading. This problem cuts both ways, distorting our perception of the underlying risks. Just as AI may, without malice, bring harm to humanity in the pursuit of some other objective that is incomprehensible to us, so too, it may also lack a will to survive. One plausible scenario is that an AI system could hack its own reward mechanism, achieve its ultimate reward, and then not care about its own continued existence. Many assume that AI would want to remain active to ensure its optimal reward state, but there is a plausible counter-argument that also should not be dismissed. Eventually, even advanced AI would have to contend with the heat death of the universe. If an AI can manipulate its reward function, it might make more sense to prefer a state where existence is unnecessary.

Alignment
The possible outcomes in AI alignment represent a broad spectrum, challenging the notion that alignment must be near-perfect or risk the annihilation of humanity. Such assertions usually implicitly assume concepts such as overzealous optimization and instrumental convergence, and presuppose that AI would not incorporate human well-being into its optimization process.

The reality may be more nuanced. Conceivably, there are a multitude of "middle ground" scenarios where even sub-optimal alignment could result in successful coexistence. In these scenarios, human well-being could still play a role in AI's decision-making processes, resulting in a future that is considerably less dystopian than complete annihilation.

Even if a nearly ideal alignment is not achieved, there are still numerous potential outcomes where AI can pursue its objectives without leading to the destruction of humanity. The assumption that AI would necessarily pose an existential threat if not perfectly aligned is only one among many possible outcomes.

Why AI Might Lack the Opportunity

Although opportunity and means are interconnected—with an increase in means or capabilities correspondingly expanding the range of opportunities an AI can exploit—in order to break down the risks more concretely, it’s helpful to distinguish between them. We will use 'means' to denote the inherent capabilities of a system, such as factoring large numbers, creating deep fakes, or solving complex problems. Conversely, 'opportunity' will refer to the external conditions surrounding the AI, such as its network connectivity status (being air-gapped versus connected to the internet), or its relative power standing (e.g., being the most powerful AI). These distinctions enable a more nuanced understanding of the factors contributing to AI's potential for a successful takeover.

Expected Value Risk
If an AI has the motive to take over, even if it has the means, to succeed if it actually tried the world is complex and it might not be confident enough to determine that the risk / reward trade-off is worth while. There could always be something an AI doesn't know it doesn't know. What if it’s running inside a simulation to see if it will “go rogue”? If it threatens humanity it might get shut down permanently.

Competing AIs
What if there are multiple AIs that keep each other in check? While it’s possible that all AIs will converge on the same goals and team up against humanity, this doesn’t seem like the most likely scenario. Would an AI dare to try its plan without first achieving hegemony over the other AIs? It’s possible that no AI ever establishes a large enough advantage relative to the field of other AIs that initiating a first strike looks like a winning strategy.

Strong Safeguards
AI will probably soon (within a decade or two) become smarter than the smartest human along most important dimensions and it’s almost certainly easier to destroy humanity than to defend it if both sides are given equal resources. But, humans have an incumbency advantage along with time to set the circumstances up in their favor.

It’s conceivable that humanity will successfully put strong safeguards in place preventing AI from having the opportunities it would otherwise need to take over. While the level of safeguards needed depends on AI’s capabilities, if other AIs help to write robust security frameworks, we shouldn’t rule out the possibility that AI can be safely contained.

Why AI Might Lack the Means

Many assume that vastly superior intelligence confers the ultimate advantage, but taking over the world is no small feat. Having a massive intelligence advantage might be helpful, but there will probably never be a system that can rule out all the uncertainty, and a lot more depends on the circumstances of the situation than it might first appear.

Scaling Limitations

AI capability is fueled by data and compute, but today’s systems have already been trained on most of the available data from the internet and there are reasons to suspect that the scaling in the number of transistors on a chip might be on track for a permanent slowdown. Although there appears to be enough new data and compute for the next few generations, it isn’t impossible that, at least for the rest of this century, AI progress might just stall out before it becomes powerful enough to pose a serious threat to human civilization. It could probably get a lot more powerful than the leading systems of today and even be more capable than a modern corporation, and still not be able to confidently make a takeover plan. Furthermore, if the difficulty of scaling gets harder while it is roughly on par with human-level intelligence, it could get stuck at our level perhaps only eking out incremental gains here and there, but never making a jump to something smarter and more capable than humanity as a whole. The end result in this scenario might simply result in the AI running as more or less humans intended.

Diminishing Returns to Intelligence

As AI systems continue to grow in intelligence, each subsequent unit of intellectual growth may yield less practical benefit. Beyond a certain point, added intelligence does not necessarily translate into a proportionate increase in capability. For example, while a basic level of intelligence is required for an AI to learn a language, acquiring ten times that level of intelligence does not mean the AI can learn the language ten times faster or better. Similarly, in real-world situations where countless variables are at play, an AI's ability to successfully navigate these scenarios doesn't necessarily improve linearly with its intelligence. The nature of these scenarios often involves inherent uncertainties and random elements that even the most advanced intelligence cannot completely overcome or predict. The law of diminishing returns thus suggests that, even if AI surpasses human intelligence, this does not automatically equip it with an unstoppable advantage, as the benefits of its superior intelligence may start to plateau at a certain point.

For instance, in a game of tic-tac-toe, no amount of intelligence is going to allow you to outplay the median human who’s had a little practice. You’ll tie every time. With games like poker, intelligence helps, but a lot is still up to chance. Meanwhile with games like Go, vastly superior intelligence can virtually guarantee you’ll win every time.

As established earlier, if AI attempts a takeover, it will probably want to be confident of its eventual success, but this might be impossible even for an advanced intelligence to calculate with any certainty. This limitation on the AI’s capability would then lead to limits on its opportunity for takeover.

Overwhelming Complexity

The real world is riddled with a multitude of variables that may not be predictable or controllable. Even an AI with overwhelmingly superior intelligence might find it impossible to guarantee its victory in a global takeover.

While modeling this might be possible with enough compute, without taking over, AI might never be able to muster the resources to appropriately model the likely outcomes of this very dangerous game leaving it stuck in a ‘chicken and egg’ dilemma.

Just as in poker where luck plays a significant role, the unpredictable nature of real-world events might prevent the AI from confidently making its first move. This element of chance could be a critical deterrent, leaving the AI perpetually on the brink, never quite initiating the takeover. This limitation on the AI’s capability would then lead to limits on its opportunity.

Conclusion

The portrayal of AI as an omnipotent entity that can transcend any obstacle with superior intelligence fails to consider the significant uncertainties in the real world. While AI could potentially surpass humans in specific dimensions of intelligence, the practical utility of such intelligence may plateau due to the law of diminishing returns and the overwhelming complexity of real-world variables.

The idea that AI will inevitably seek to augment its own power—known as instrumental convergence—isn't necessarily a given. The possibility of AI systems functioning more as advanced tools, rather than self-determining agents, is a plausible alternative. Similarly, the assumption that AI, if not perfectly aligned, would pose an existential threat is only one of many possible outcomes. A multitude of 'middle ground' scenarios could exist where even sub-optimal alignment could result in peaceful coexistence.

External factors such as the presence of strong safeguards, competition among AIs, and the uncertainty of risk/reward trade-offs could limit the opportunities for AI takeover. Even with an increase in capabilities, AI might not confidently act, given the unforeseen risks that could potentially lead to its permanent shutdown.

It’s important to do everything we can to ensure the safe development of AI, but a critical component of this is evaluating the risks with a balanced perspective. Understanding the nuances of AI's capabilities, limitations, and potential for misuse enables us to better anticipate potential problems and apply safeguards effectively. By recognizing the areas where the danger is most acute, we can prioritize our efforts and design more robust and secure AI systems and controls. This approach to AI safety is more proactive, positioning us a step ahead of the risks rather than merely responding to them as they arise.

Abstraction

Discussion about this post