The Broken Leg Check

the ideal first step in the pipeline for both humans and bots

Jan 07, 2026

This post is part of a series on building an AI forecasting bot. In my last post, I described getting the bot running and cutting costs by almost 30x. Now I want to start walking through the pipeline itself.

The first step mirrors what a skilled human forecaster would do: check if you can skip the work entirely. Not only does this save time for a human forecaster, it saves computation and therefore money if a bot does it correctly.

The Broken Leg Rule

The concept comes from psychologist Paul Meehl, who spent his career studying when statistical models outperform human judgment. His finding was consistent and humbling: simple actuarial formulas beat expert intuition in domain after domain, but Meehl identified an exception.

Suppose you’re predicting whether someone will go dancing on Friday night. Your statistical model, trained on their past behavior, says 99%. But you happen to know they broke their leg this morning. You should override the model.

This “broken leg” represents information so decisive that it renders the baseline analysis irrelevant. The model can’t account for it because it’s rare, specific, and outside the training distribution. But when you have it, you should use it.

Meehl’s point wasn’t that intuition beats statistics. It was the opposite: trust the model almost always, but recognize the rare cases where you have information the model doesn’t.

Why Check First

For a forecasting bot with a complicated pipeline, there’s a practical reason to check for broken legs before doing anything else.

If the question is “Will Nicolás Maduro cease to be president of Venezuela during 2026?” and the news says U.S. forces just captured him and flew him to New York, you don’t need base rates. You don’t need time series analysis. You don’t need to decompose the question into conditional chains. The answer is already overwhelmingly clear.

A human forecaster would recognize this immediately and move on. The bot should do the same. Spending tokens on a full analysis when the outcome is already determined is wasted computation.

The Broken Leg check is the first step in my pipeline because it’s the cheapest way to potentially avoid all subsequent steps. A quick news lookup costs a fraction of what the full pipeline costs. If it fires correctly, you save everything downstream.

The Danger

Here’s the problem: being confidently wrong is catastrophic in a forecasting tournament.

Proper scoring punishes confidence asymmetrically. A broken leg check that fires incorrectly can sink your entire tournament. One blowout erases the gains from dozens of correct shortcuts.

I’ve seen human forecasters make this mistake. A headline feels definitive. Congress passes a law. A major announcement drops. The forecaster updates hard, confident they’ve found a broken leg. But the resolution criteria are more nuanced than the headline suggests, and the “decisive” news turns out to be merely suggestive.

The instinct to update on breaking news is correct. The instinct to update all the way is dangerous.

The Trade-off

So why include the check at all?

Because when it works, it’s extremely valuable. Not just for accuracy but for efficiency. You’ve spent minimal resources to correctly skip an expensive pipeline. Those saved resources compound across questions.

The trade-off is straightforward: spend a little up front to potentially save a lot. But only if your broken leg detector is reliable enough that the expected savings exceed the expected cost of the occasional blowout.

This is an empirical question. You can’t know the answer in advance. You have to run the bot, track when the check fires, see how often it’s right, and calculate whether you’re coming out ahead.

Right now, I’m still gathering data. The check is implemented and I’m monitoring its performance. If the blowouts start outweighing the savings, I’ll tune it, add confidence dampening, improve the news filtering, or disable it entirely until I can make it more robust.

Implementation

The actual implementation is simple. Before running the full pipeline, the bot queries a news API with the question text. If the returned summary contains information that overwhelmingly resolves the question in one direction, output a high-confidence probability and stop.

The hard part isn’t the code. It’s defining “overwhelmingly resolves.” That’s where the judgment lives, and where the misfires come from.

A president being captured and jailed? Overwhelmingly resolves “will they cease to be president.” A law being passed requiring document release? Maybe. Depends on enforcement, interpretation, timing, and exactly what the resolution criteria specify.

The more I can teach the bot to distinguish between these cases, the more value the Broken Leg check provides. That’s an ongoing refinement.

What’s Next

The Broken Leg check is step zero—the escape hatch. Most questions won’t trigger it, and the bot will proceed to the main pipeline.

In the next post, I’ll cover what happens when you don’t have a broken leg: how the bot classifies questions and selects the appropriate forecasting method.

AgentM

Jan 7

It seems like the complication is in defining a "broken leg" event. For example, even in the case of Maduro, I would guess that there is a non-zero chance that he could be found not guilty or released for other reasons before the end of the year. The broken leg optimization does not take into account that the broken leg event itself could face a broken leg event.

1 reply by Jonathan Mann

1 more comment...

Abstraction

Discussion about this post

Ready for more?