The Illusion of Simulated Incubation, Part 3: Five Defensible Alternatives

  • May 11, 2026
  • 5 min read

In Part 1 I laid out why simulated incubation fails as a substitute for real incubation: it inverts the temporal asymmetry of knowledge that makes incubation informative. In Part 2 I quantified the bias with a Monte Carlo simulation.

If simulated incubation is statistically broken, what should you do instead? This part covers five alternatives, ranked roughly by rigor. None is free, but all are genuinely defensible to a sophisticated allocator or a risk committee that knows what to ask.

1. Pre-registered walk-forward

The trading equivalent of pre-registration in medicine. Before starting the backtest, write a document specifying:

  • hypotheses,
  • universe,
  • features,
  • hyperparameter ranges,
  • evaluation metric,
  • stopping criterion.

Generate a timestamped hash of the document and the codebase. The walk-forward then proceeds on data after that timestamp.

Even without external enforcement, the mere existence of the protocol drastically reduces implicit look-ahead. The act of writing down “I will use winsorization at 2.5%” before seeing the result forces you to confront whether that was an evidence-based choice or a regime-conditioned default.

This is the cheapest of the five options and one of the most underused.

2. Combinatorial Purged Cross-Validation (CPCV)

When data is scarce and real incubation is not feasible, CPCV (López de Prado 2018, Ch. 12) lets you use the full dataset while producing multiple estimates of OOS Sharpe instead of a single point estimate.

The setup: divide the dataset into  folds, then take all combinations of  folds as training and the rest as testing ,applying purging and embargo around each test fold to prevent leakage from autocorrelation.

The number of OOS paths grows combinatorially with , and the distribution of OOS Sharpes lets you compute the Deflated Sharpe Ratio and the Probability of Backtest Overfitting rigorously. CPCV does not replace real incubation, but if it replaces simulated incubation, it is strictly superior.

3. Deflated Sharpe Ratio

If you are about to present simulated-incubation results despite my objections, at minimum attach the Deflated Sharpe computed on the honest number of trials.

The honest number is not just the trials you tracked. It includes:

  • hyperparameter combinations explored even briefly;
  • variants you discarded after a quick look at the curve;
  • “mental” trials, hypotheses you considered and dismissed without ever writing the code.

Estimating this honestly is a discipline of its own. A reasonable rule of thumb: multiply your tracked trial count by 3 to 10. Then apply a basic formula, if you need the formula contact me . The exercise is soberingly educational. It typically kills the significance of results that “looked good.”

4. Probability of Backtest Overfitting (PBO)

The natural complement to DSR. Compute PBO via the combinatorial bootstrap of Bailey, Borwein, López de Prado & Zhu (2017): the probability that the rank-1 in-sample strategy is below median in OOS.

A simulated-incubation procedure that produces PBO > 50% is signaling that your top in-sample strategy is more likely than not to underperform the median in true OOS. That’s a strong negative signal even before you’ve run real money.

5. Short but real incubation

Politically often the smartest option. Three months of real paper trading are qualitatively superior to twelve months of simulated incubation, because they satisfy the three properties from Part 1:

  1. they lock in the researcher (the data did not exist when you designed),
  2. they generate a genuine forward realization,
  3. they sample an unchosen regime.

Yes, three months has higher variance than twelve. But it has the right structure, and you can pair it with CPCV on the historical data for a complete, rigorous evaluation.

A sophisticated allocator prefers 3 real months to 12 fake ones. An allocator who doesn’t is itself a contrary selection signal: it tells you something about the quality of their due diligence.

When is simulated incubation acceptable?

For completeness: there are narrow conditions where the critique weakens.

  1. Strategy fully pre-specified by independent literature. If you are rigorously replicating a paper published before the start of the “incubation” window, including universe, hyperparameters, and cost model — and you can prove it with a hash, simulated incubation approximates a genuine OOS for that specific Not for any variant.
  2. Fully non-parametric model on fixed features. Rare in finance.
  3. As a secondary stress test, not as primary evidence. Presented as a “robustness check” alongside real incubation or CPCV.

Outside these cases, simulated incubation should be considered Grade C evidence: useful as a sanity check, not as evidence of alpha.

Conclusion of the series

Incubation is a powerful instrument because it exploits an informational asymmetry that time imposes for free: the future hasn’t happened yet. Simulated incubation gives up that asymmetry and tries to substitute it with a declaration of discipline. Statistically, the substitution does not work , as Part 2’s Monte Carlo shows even under very favorable assumptions.

This is not a moral critique. It’s a structural one. Even the most honest researcher cannot unknow what they know. The only rigorous defense is procedural: pre-registration, CPCV, Deflated Sharpe, PBO, and — when possible — some real incubation, however brief.

The good news: adopting even one of these practices improves the quality of your process more than another hundred hours of feature engineering. The bad news: none of them is free.

 

References

Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5).

Bailey, D. H., Borwein, J., López de Prado, M., & Zhu, Q. J. (2017). The Probability of Backtest Overfitting. Journal of Computational Finance, 20(4).

Harvey, C. R., Liu, Y., & Zhu, H. (2016). …and the Cross-Section of Expected Returns. Review of Financial Studies, 29(1).

Harvey, C. R., & Liu, Y. (2020). False (and Missed) Discoveries in Financial Economics. Journal of Finance, 75(5).

Leinweber, D. J. (2007). Stupid Data Miner Tricks: Overfitting the S&P 500. Journal of Investing, 16(1).

López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.

López de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge University Press..