From Composite Scores to a Live Portfolio

May 26, 2026
11 min read

Home
blog
From Composite Scores to a Liv ...

Allocation under real constraints, multi-scenario evaluation, and the monitoring discipline that keeps the portfolio honest after launch.

The gap between a score and a portfolio

The composite score from the previous post tells you, for each strategy in your universe, the probability that its edge is real. It does not tell you how many contracts to trade, in what combinations, under what risk limits, or when to change the allocation. Those are the operational decisions that turn statistical analysis into a tradeable portfolio.

The gap is wider than it looks. A composite score is a continuous number in [0,1]. A portfolio is a vector of integers — the number of contracts of each strategy — that must satisfy hard constraints on margin, volatility, and concentration. Mapping the first onto the second is an optimization problem, and the choice of objective and constraints matters a great deal.

In this post I’ll describe a pragmatic approach: how to formulate the optimization, how to choose between alternative portfolios, and how to monitor the portfolio once it is live so that you can react to evidence that the edges are decaying.

Objective: what are we optimizing?

The starting question for any portfolio optimization is: maximize what, subject to what?

The classical answer is Markowitz mean-variance: maximize expected return for a given variance, or equivalently maximize a quadratic utility function. Markowitz is elegant on paper and unstable in practice. It is notoriously sensitive to estimation error in expected returns, it ignores higher moments, and its solutions amplify the very noise we have just spent four blog posts trying to control.

A more robust modern objective for futures portfolios with discrete sizing is some flavor of Mean-CVaR (Conditional Value-at-Risk, also called Expected Shortfall). The CVaR at confidence α is the expected loss conditional on being in the worst α% of outcomes. Minimizing CVaR penalizes left-tail risk, which is what investors actually care about, much more than variance does. The Rockafellar-Uryasev (2000) formulation of CVaR optimization is convex and tractable.

In practice, I find a composite objective works well:

objective = (Return / |MaxDD|) × √Sharpe × (1 − concentration_penalty)

This is not Mean-CVaR; it is a hybrid scoring that prioritizes the return-to-drawdown ratio (which traders intuitively understand), rewards Sharpe (statistical robustness), and penalizes concentration via Herfindahl-Hirschman index. The function is not convex, but it is easy to evaluate, and we can use stochastic search rather than gradient methods to find good solutions.

The constraints are typically:

— Total margin used ≤ cap (e.g., $50K, $100K, $250K). — Volatility of aggregated daily P&L ≤ target (e.g., 10% or 20% annualized). — Integer non-negative contract count per strategy. — Maximum margin share per strategy ≤ some fraction (typically 30 to 40 percent). — Maximum margin share per market family ≤ some fraction (to prevent cluster concentration). — Minimum utilization (e.g., ≥ 50 percent of cap) to avoid trivial solutions that leave capital idle.

The composite scores enter through the random sampling: strategies with higher composite are sampled more frequently into candidate allocations. This biases the search toward credible strategies without hard-excluding any.

Solving the integer optimization

Mixed Integer Linear Programming via cvxpy or PuLP can solve this exactly if you can write the objective as linear. A composite objective with HHI and Sharpe components is not linear, so a direct MILP doesn’t apply.

A practical alternative is a two-stage stochastic search:

Random sampling. Generate 50,000 to 100,000 random integer allocations that satisfy the hard constraints, with sampling probabilities biased by composite scores. Evaluate each on the objective.
Greedy local refinement. Take the top 50 solutions from random sampling. For each, try ±1 contract perturbations on every strategy, keeping any improvement. Iterate until no perturbation improves the objective. This converges to local optima typically within 10-30 iterations.

This approach has three properties that matter:

— It is fast (minutes on a laptop for an 18-strategy universe). — It finds robust solutions because the random initialization explores broadly. — Top solutions are typically nearly identical across runs (different random seeds converge to the same local optimum), which is a sanity check.

The top-3 solutions are usually identical or very close, which gives confidence that the optimum is real rather than a quirk of the sampler.

Evaluating against baselines

A single optimal portfolio is not enough. You should always compare against three baselines:

— Equal-weight on the shortlist. Allocates one contract to each strategy that passed composite filtering. The simplest possible portfolio. — Concentrated on the highest-composite strategy. Maximum contracts of just the strongest-evidence strategy. — Inverse-volatility weighted. Classical risk parity-like allocation.

The optimized portfolio should outperform these baselines by a meaningful margin on the chosen metric. The threshold I use is roughly Sharpe differential > 0.3 with reasonable confidence; below that, the optimization is probably overfit to the specific data used in calibration.

This step is uncomfortable for many practitioners because the optimized portfolio often doesn’t significantly beat equal-weight on small universes. The honest response is to use equal-weight rather than over-engineering the optimization. The discipline of comparing forces you to confront when fine-tuning is unjustified.

Multi-scenario evaluation

The choice of evaluation window matters enormously. A portfolio optimized only on the incubation period will pick strategies whose edge is strongest in the most recent regime. A portfolio optimized on a longer window (OOS + incubation) will pick strategies with greater historical consistency, which may or may not include the most recent winners.

It is useful to produce both:

— Incubation-window optimization captures recent improvement; it is the right choice if you believe recent edges will persist forward. — Longer-window optimization captures consistency; it is the right choice if you are skeptical of recent metric improvements.

Looking at both side-by-side reveals interesting structure. Strategies that are top picks in both windows are robust across regimes. Strategies that are top picks in the recent window but not the longer one have ambiguous status: they may have genuine recent improvement, or they may be overfit to recent data. The composite score from the previous post should already have downgraded the latter, but the dual-window comparison is an additional check.

For each capital level ($50K, $100K, $250K), running both evaluation windows produces two portfolios. Six portfolios total. They share strategies but differ in weights, and the operational tradeoffs (concentration, complexity, expected return, maximum drawdown) vary meaningfully across them.

Translating to operational reality

A few practical considerations that often get glossed over:

Margin is not contract margin × number of contracts. Brokers apply SPAN margin (or portfolio margin) with offsets across positions. Two strategies that are mutually hedging may have lower combined margin than the sum of their individual requirements. For initial planning, use the simple sum; for final sizing, run the actual margin calculator your broker provides.

Volatility target is an annual number, but daily realized vol can deviate substantially. A portfolio with 10% annualized vol target will routinely have weeks of 12-15% realized vol and weeks of 6-8%. The target is a long-run anchor, not a daily cap.

Concentration constraints are not just risk management — they are protection against single-strategy breakdown. If your top strategy fails operationally (a data feed error, a parameter drift, an unanticipated execution behavior), you don’t want the rest of the portfolio to be a footnote. A 30 to 40 percent cap per strategy is a sensible hard limit.

Capital utilization below 100% is a feature. Most portfolios should run at 70 to 90 percent utilization of nominal margin, leaving cushion for overnight SPAN requirements, intraday equity fluctuations, and the occasional margin call from unexpected position moves. A 100% utilized portfolio is brittle.

Live monitoring

Once the portfolio is allocated and trading, the work shifts to monitoring. Three orthogonal signals work well in tandem.

Rolling DSR with confidence band. Recompute the DSR on each strategy on a rolling window (typically 6 months of trading days). Plot the lower bound of the 95% confidence interval. If it drops below zero for four or more consecutive weeks, the strategy enters a “warning” state.

CUSUM and Page-Hinkley tests on residuals. Define a residual per trade as the difference between observed P&L and the P&L expected under the strategy’s incubation distribution. The cumulative sum of these residuals should drift around zero if the strategy is performing as expected. CUSUM and Page-Hinkley are sequential tests that flag rapid changes in the mean of the residuals — i.e., regime breaks. Each has its own false-alarm characteristics; running both provides redundancy.

Bayesian Online Change Point Detection (Adams & MacKay 2007). A continuous posterior over the probability that a structural break has occurred. This is smoother than CUSUM and gives a “probability of breakdown” that can be used for graduated weight reduction rather than binary on/off decisions.

A strategy that triggers any one of these signals enters “warning”; two triggers means “severe”; three means “kill” (close positions immediately and exclude from rebalancing until further analysis).

Rebalancing discipline

The rebalancing schedule should be predominantly time-based with trigger-based overrides:

— Quarterly scheduled rebalance. End of March, June, September, December. Re-run the composite scoring on updated data, re-run the optimization, transition to the new allocation. This is the “default” rhythm. — Trigger-based intra-quarter rebalance. When a strategy enters “severe” or “kill” status, when the portfolio’s realized volatility deviates from target by more than 25-30% for two consecutive months, or when the average intra-cluster correlation spikes above a threshold.

The full selection pipeline (composite scoring on all candidate strategies) should be re-run every six months. Composites drift with new data; what looked like a top-3 strategy a year ago may be a borderline marginal one today. The framework is intentionally re-runnable so that periodic re-evaluation is cheap.

Capital decay protection

A subtle but important feature of long-term portfolio management is the gradual decay of edge for strategies that have been live for many years. Even strategies with no detectable breakdown signal can suffer slow erosion as markets adapt. A simple protection is a capital decay schedule: for strategies that have been live for more than four years, gradually shift their weight toward the equal-weight baseline of the shortlist. The logic is that the longer a strategy has been “alive,” the more likely some of its edge is structural and some is overfit to its development period. The equal-weight baseline is a conservative prior that the decay schedule pulls toward.

This is a slow background adjustment, not an emergency action. Each year, perhaps 5 to 10 percent of a long-running strategy’s weight shifts toward the baseline. The effect is small in any single year but compounds usefully over time.

What this looks like end-to-end

A complete workflow, end to end:

Develop strategies via the usual pipeline (in-sample, OOS, Monte Carlo, incubation).
Apply the four-test framework (DSR, Haircut, PBO, Stability) to all incubation survivors.
Compute composite scores; stratify into Core, Satellite, Marginal, Avoid tiers.
Run the portfolio optimization at each relevant capital level, against multiple evaluation windows.
Compare optimized portfolios against equal-weight, max-composite-concentrated, and inverse-vol baselines.
Select the operational portfolio based on objectives and constraints; commit to a specific allocation.
Begin live trading; instrument the three monitoring signals from day one.
Rebalance quarterly on schedule, intra-quarter on trigger, fully re-evaluate every six months.

The discipline of this workflow matters more than the precision of any single step. The framework is robust because it explicitly addresses survival bias at every stage and because it includes feedback loops for live performance evaluation. Strategies enter the portfolio with statistical evidence; they leave when the evidence reverses.

What is the expected outcome? Honestly, modest. A portfolio constructed this way will probably not have the Sharpe ratios that show up in the naive backtest of the surviving strategies — because most of those Sharpe ratios were inflated by selection. It will probably have Sharpe ratios somewhere between half and three-quarters of the naive estimate, with corresponding reductions in drawdown and improvements in robustness across regime changes. That gap between “naive” and “honest” expected performance is exactly the gap that the four-test framework, the composite score, and the constrained optimization are designed to surface.

The goal is not to maximize a backtested number. The goal is to deploy a portfolio that performs in production roughly as expected, with the kind of statistical defensibility that survives independent review and outlives the developer who built it.

This concludes the three-part blog series on quantitative portfolio construction for systematic strategies. The companion LinkedIn series covers the same material in shorter form.

References

Bailey, D.H., López de Prado, M. (2014). The Deflated Sharpe Ratio. JPM.
Bailey, D.H., Borwein, J.M., López de Prado, M., Zhu, Q.J. (2017). PBO. JCF.
Rockafellar, R.T., Uryasev, S. (2000). Optimization of Conditional Value-at-Risk. Journal of Risk.
López de Prado, M. (2016). Building Diversified Portfolios that Outperform Out-of-Sample. JPM.
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
Adams, R.P., MacKay, D.J.C. (2007). Bayesian Online Changepoint Detection. arXiv:0710.3742.
Michaud, R.O. (1998). Efficient Asset Management. Harvard Business School Press.
Page, E.S. (1954). Continuous Inspection Schemes. Biometrika.