Discovering Better World Models

separating discernment from delusion

Apr 24, 2025

A dark, comic book style digital art image depicting multiple interconnected worlds or universes, each representing different world models, with stylized human brains connected to these worlds. The brains should be detailed and graphic, without any faces or facial features, and linked to the worlds through glowing, electric pathways. Each world or universe should have distinct, exaggerated features in a darker palette, reflecting different aspects of reality. The background should be deep, dark, and cosmic, adding a sense of mystery and depth. The overall style should be bold, with sharp contrasts and a dynamic, intense composition, emphasizing the interconnectedness and complexity of the worlds and brains. — from GPT-4o

Imagine betting your life savings on a weather forecast, only to discover the meteorologist was relying on a groundhog's shadow instead of analyzing atmospheric data. This is essentially what we do when addressing global existential threats without scrutinizing our underlying world models.

From nuclear war to bioterrorism, from authoritarian regimes to pandemic risks, humanity faces numerous existential threats. Yet we struggle to agree not only on how to address these challenges, but even which ones deserve priority. With our future at stake, we need better methods to distinguish genuine risks from distractions.

The solution may be simpler than we think: systematic forecasting that puts models to the test.

World Models Described

Everyone has a world model, whether they realize it or not. It's an internal representation of reality – the beliefs, assumptions, and mental shortcuts used to make sense of a complex world. Essentially it’s a framework for understanding how the world functions.

When you wake up in the morning and plan your day, you're using your world model. You assume the sun will rise, your lights will come on when you flip the switch, and the world outside your house will exist when you go out the door. These aren't just random guesses – they're based on your understanding of how the world operates.

The problem is, our world models can be flawed, particularly when we don’t get direct feedback on them. They can be based on incomplete information, biased by our personal experiences, or simply outdated. It’s not a big deal when the stakes are low and we find out that while we had no trouble explaining how an airplane can fly normally, we can’t explain how it can fly upside-down. Unfortunately when we're dealing with global challenges, the limitations of our world models become critically important.

Evaluating Competing World Models

The first challenge is that there are a lot of competing world models to consider, so many in fact that evaluating them all would be an intractable problem. In virtually any given field, even if there are many points of consensus, you'll still find areas where there is wild disagreement. This diversity of models reflects the complexity of the world we're trying to understand, but it also makes it difficult to determine which models are most accurate or useful.

On top of that, we face the problem that every now and again the prevailing paradigm is flawed, but gathering the data to overturn it could take a long time to collect. This is challenging because data collection can be inherently difficult, especially for complex global issues, and the feedback loops can be very long, making it hard to quickly validate or invalidate a model. Moreover, existing paradigms often become entrenched not just due to scientific inertia, but also because of ego investments, tribal affiliations within academic or professional communities, and power structures that benefit from the status quo. These social and institutional factors can create significant resistance to new ideas or evidence that challenges established models, further complicating the process of correcting flawed paradigms.

Additionally, even if a model has very good explanatory power, it might still be spurious. Consider the classic example of the correlation between ice cream sales and drowning incidents. A model based on this correlation might appear to have strong predictive power, but it misses the true causal factor (warm weather) that influences both variables.

Evaluating Models with Instrumentalism

It probably goes without saying that some models produce better results than others. Consider this passage from Sapiens.

In 1744, two Presbyterian clergymen in Scotland, Alexander Webster and Robert Wallace, decided to set up a life-insurance fund that would provide pensions for the widows and orphans of dead clergymen. They proposed that each of their church’s ministers would pay a small portion of his income into the fund, which would invest the money. If a minister died, his widow would receive dividends on the fund’s profits. This would allow her to live comfortably for the rest of her life. But to determine how much the ministers had to pay in so that the fund would have enough money to live up to its obligations, Webster and Wallace had to be able to predict how many ministers would die each year, how many widows and orphans they would leave behind, and by how many years the widows would outlive their husbands.
Take note of what the two churchmen did not do. They did not pray to God to reveal the answer. Nor did they search for an answer in the Holy Scriptures or among the works of ancient theologians. Nor did they enter into an abstract philosophical disputation. Being Scots, they were practical types. So they contacted a professor of mathematics from the University of Edinburgh, Colin Maclaurin. The three of them collected data on the ages at which people died and used these to calculate how many ministers were likely to pass away in any given year.

How do we differentiate between competing world models? It depends on the goal, of course. For some, it's about forming a consistent paradigm, for others it's a grab bag of ad hoc rules that work well enough to get by without thinking too hard, and for others it may be more about finding a set of beliefs that allows for minimum conflict within their tribe. For most readers here, however, I suspect it is more about practical utility: using models that help us accurately predict and navigate reality.

This pragmatic approach, known as instrumentalism, judges models not by whether they're "true" in some abstract sense, but by whether they're useful tools for understanding and interacting with the world. Under instrumentalism, the best model is the one that most reliably generates accurate predictions and enables effective intervention.

Consider how we evaluate scientific theories: Newton's laws of motion aren't considered valuable because they represent some perfect, ultimate truth, in fact, we know they break down at quantum scales. They're valuable because they allow us to predict with remarkable accuracy how objects will behave under everyday conditions. Similarly, when evaluating world models for global challenges, we should prioritize predictive power over ideological purity or intuitive appeal.

This is where good forecasters enter the picture. These individuals and teams have empirical track records of successfully predicting future events across diverse domains. Their consistent success isn't theoretical, it's demonstrated repeatedly through verifiable results. While we'll explore the cognitive mechanisms behind their success in a future article, the important point is that their methods work. By identifying who consistently makes accurate predictions, we gain a shortcut to identifying which world models are most effective for understanding reality.

The instrumentalist view liberates us from endless philosophical debates about the "true nature" of reality and focuses our attention on what matters: which models help us anticipate and address real-world problems most effectively. When facing existential risks, this practical focus isn't just intellectually honest, it's essential.

The Cheat Code

One might reasonably ask: How do we know forecasters aren't just identifying spurious correlations themselves? After all, we've seen that correlations can appear predictive without revealing true causation.

What sets good forecasters apart is their consistent performance across diverse domains and time horizons. Unlike the ice cream-drowning correlation that breaks down when conditions change, skilled forecasters maintain accuracy through shifting circumstances. They don't simply identify statistical patterns, they build flexible models accounting for causal mechanisms and context-specific factors.

When forecasters maintain predictive accuracy whether analyzing geopolitics, economics, or public health, they demonstrate something beyond coincidental association. They actively combat the spurious correlation trap through techniques like base rate analysis, scenario testing, and continuous recalibration. Their success persists precisely when simple correlations fail, demonstrating a resilience that validates their methodology as capturing something fundamentally useful about reality.

Indeed, research shows that forecasters with good track records tend to maintain their high performance across various domains, making well-calibrated predictions even outside their areas of expertise. This consistency isn't merely theoretical, it's an empirically verified phenomenon that gives us a powerful tool for assessing world models.

Even more impressive, groups of forecasters consistently outperform individuals. These collaborative efforts combine diverse perspectives and analytical approaches to produce remarkably accurate predictions, often surpassing traditional domain experts. Their strength lies not in specialized knowledge, but in their ability to effectively synthesize expert findings and arguments into coherent, actionable forecasts.

Consider this passage from my fellow forecaster Molly Hickman’s recent article that she wrote in conjunction with Rajashree Agrawal:

The foundational literature (e.g. Goldstein et al, Karvetski et al, Tetlock's Expert Political Judgment) in Tetlockian forecasting seems to point at the top forecasters having better understanding of how the world works than domain experts (if you're on board with the idea that proper scores, in this case Brier, are a reasonable proxy for "understanding of how the world works"). Despite lacking the years of knowledge, experience, and gut-feelings that experts have, forecasters are better at putting numbers to whatever knowledge, experience, and gut-feelings they do have—implying that high-quality expression of shallow information > low-quality expression of deep information. This makes some intuitive sense; for instance, despite farmers' having much deeper understanding of agriculture, it doesn't seem like they're better at crop futures trading than traders.

This observation captures an important insight: the quality of the forecasting process can outweigh the depth of subject-matter expertise. This helps explain why forecasters can succeed across diverse domains: they've optimized the methodology of prediction itself rather than focusing solely on accumulating domain knowledge. The agriculture/trading example illustrates how even in highly specialized fields, good forecasting techniques often triumph over experience-based intuition.

Importantly, these forecasters have the right incentives. They stake their reputations on their predictions, aligning their interests with the pursuit of truth rather than supporting particular agendas or maintaining the status quo.

By tapping into this collective wisdom, we can navigate complex issues more effectively, identify which world models are most accurate in practice, spot emerging trends or potential threats earlier, and make more informed decisions in the face of uncertainty.

As we confront the array of challenges facing civilization, from nuclear risks to climate change, from biosecurity to governance failures, this forecasting approach offers our best path forward. Rather than endless theoretical debates or waiting years for definitive evidence, we can look to these forecasting groups as our reality-checking mechanism and a practical means of determining which world models truly deserve our attention and resources.

In a world of competing narratives and complex threats, good forecasting may be our most valuable compass.

Abstraction

Discussion about this post