Pre-Mortem and Sanity Check

the final gate in the forecast pipeline

Jan 14, 2026

This is the fifth post in a series on building an AI forecasting bot. In previous posts, I covered the Broken Leg check (short-circuiting on breaking news) and classification and method selection (routing questions to the right forecasting approach) and decay functions (updating as time runs out).

Forecasting failures are often obvious in hindsight. A lot of them could have been caught by a brief moment of structured skepticism. Conceptually, there are two parts: a pre-mortem and a sanity check. They are related, but they catch different classes of mistakes. The pre-mortem asks “what could go wrong?” while the sanity check ensures the forecast doesn’t violate basic reasoning.

What Could Go Wrong?

The pre-mortem assumes the forecast is wrong and asks why.

What plausible unconsidered path could change the outcome?

Consider a question like:

Will Company X release Product Y by December 31?

Suppose the pipeline outputs 65%, based on past delivery timelines, current hiring signals, and executive statements. This may be directionally reasonable, but it embeds a quiet assumption: that the product is released at all, cleanly, under the same name, and under the same definition implied by the question.

In practice, questions like this commonly fail in predictable ways.

Maybe the product is canceled. Maybe the company is acquired invalidating the “Will Company X” portion of the resolution criteria. Maybe the release happens, but in a form that doesn’t meet the specification of the question.

A forecaster might try to account for many of these implicitly, but being explicit makes it less likely that something important will be left out. These adjustments are often modest, but they materially improve the forecast.

Wait, What?

The sanity check asks a different question:

Does this number make sense given what we know about the world?

Sometimes, even for a human, following a sound process can still produce obvious failures.

Consider the question:

Will it snow on July 4th, 2026 in New York City?

Suppose the pipeline outputs 3%.

It is easy to see how this could happen if the bot grabs the wrong base rate. NYC gets snow ~12 days a year ≈ 3.3%, but to anyone with lived experience, this is obviously absurd.

If a forecast implies something wildly inconsistent with ordinary experience, one of its assumptions is probably wrong. The sanity check does its job by catching and correcting obvious problems.

Completing the Pipeline

The pre-mortem and the sanity check serve different roles. The pre-mortem looks for missing paths to failure. The sanity check looks for structural mistakes that make the forecast incoherent even if no single step was obviously wrong.

If either of these fire, the forecast needs a second pass, and possibly a different approach altogether.

Once these checks are complete, the forecast is ready to submit.

Nicolò Bagarin - 404_NOT_FOUND

Jan 15

Amazing job, Jonathan! This Substack series would also be an excellent introductory course for anyone looking to improve their forecasting skills.

I like the sequential approach. While moving through the pipeline, is the model maintaining the full context at each stage, or are the individual prompts isolated? My subjective impression is that the standard approach of querying the model with a long and context-heavy prompt contaminates the output and has diminishing returns. I'm interested in the perspective of someone who is experimenting with this approach hands-on.

Good luck with the tournament! Though I'm competing on the human side of the leaderboard this quarter, and with models getting this sophisticated, I'll need some luck myself.

Abstraction

Discussion about this post

Ready for more?