
There is a particular kind of confidence that comes from understanding the mathematics behind a trading strategy. We had it. Coming from a background in data engineering and ML, the first time we built a backtested strategy that returned a Sharpe of 2.4 across five years of historical data, the feeling was something close to certainty. The model was right. The market would eventually agree.
It didn't.
What the backtest never told us was that the regime in which the strategy would trade was not the regime in which it was designed. That is the central contradiction of systematic trading: the more precisely you model the past, the more brittle your model becomes in the future. Intelligence, in this context, is not a defence. It is often the exact mechanism of failure.
We are writing this not as a theoretical overview but as a record of what we observed β where the edge was real, where we invented it, and what we wish we had understood earlier.
The quant community is not short of intelligence. It is short of epistemic humility. The culture rewards complexity β the more elaborate the model, the more defensible it seems in a presentation. But markets are adversarial, adaptive environments. A clever model that worked for three years may have worked because it was unobserved, not because it was correct. The moment it becomes visible to other participants, the edge compresses.
Here is the thing nobody puts in a research paper: the strategies with the longest survival rates are not the most sophisticated. They are the most structurally grounded. Trend following β which is conceptually no more complex than "buy what is going up" β has outlasted dozens of ML-driven signal regimes. Not because the idea is superior, but because it earns a genuine economic premium: compensation for absorbing short-term volatility and holding through drawdown. That premium does not disappear when everyone knows about it. Most do not.
Statistical arbitrage β stat arb β is built on a single, powerful idea: that certain pairs or baskets of instruments are economically linked, and when they diverge, they will revert. The job of the model is to measure that divergence, estimate the speed of reversion, and size the trade accordingly.
The foundation is co-integration, not correlation. Two stocks can be highly correlated and still drift permanently apart. Co-integrated instruments share a common stochastic trend β the spread between them is stationary, meaning it has a well-defined mean it tends to return to. The Ornstein-Uhlenbeck process gives us the mathematical framework: the speed parameter theta tells us how quickly the spread reverts, and that speed determines how long we hold the position and how aggressively we size it.
In practice, we look for pairs with a stable co-integrating relationship, estimate the half-life of reversion (typically 5 to 15 trading days for equity pairs), and enter when the spread crosses a threshold β usually 1.5 to 2 standard deviations from the rolling mean.
The edge in stat arb was sharpest in the early 2000s, when information diffusion was slow and the number of participants running systematic strategies was small. A pair like a major bank and its closest competitor could drift for days before reverting. Today, the same trade might revert in hours β or not at all, because ten other algorithms entered the same position and crowded the spread.
What remains is the edge in less-observed pairs: cross-asset relationships, regional indices, commodity-equity links. Anywhere the signal is real but the infrastructure required to exploit it creates a natural barrier to entry.
Stat arb fails in a specific and painful way. When a co-integrated relationship breaks β because of a merger, a regulatory shift, or a fundamental repricing β you are holding a long and a short in instruments that are now diverging permanently. Both legs move against you. The model has no mechanism to distinguish temporary divergence from structural rupture. That distinction requires judgement the model does not have.
The most counterintuitive result in empirical finance is that momentum works. In an efficient market, past price performance should contain no information about future performance. And yet, across every major asset class and time horizon studied, assets that have outperformed over the past three to twelve months continue to outperform over the next one to three. The effect is persistent, robust, and widely documented β which means it should have been arbitraged away decades ago.
It hasn't been.
The reason momentum persists is not statistical β it is behavioural and institutional. Investors underreact to new information, updating their views gradually rather than instantaneously. Institutional mandates create forced buying and selling that extends trends beyond what fundamentals would justify. Central bank policy cycles create multi-year macro trends in rates, currencies, and commodities that no single participant has the mandate or the risk appetite to fade early.
Trend following earns its return by being willing to sit through volatility that forces other participants out. The premium is real, structural, and consistent with a rational market that just happens to contain participants with horizons and mandates that create predictable, exploitable flows.
A basic trend signal measures the difference between a short-term and a long-term moving average, normalised by volatility. The volatility normalisation is critical: a 2% move in a low-vol environment is a stronger signal than a 2% move in a high-vol environment. Risk-adjusted signal strength is what we're measuring, not raw price change.
More sophisticated implementations use time-series momentum: the autocorrelation of past returns at different frequencies, aggregated into a composite signal that is weighted by how stable each frequency channel has been historically.
Trend following has one failure mode: choppy, mean-reverting markets with no directional persistence. In these regimes, the strategy enters long as the market rises, gets stopped out on a reversal, re-enters short on the dip, gets stopped out again. Transaction costs and slippage accumulate. The equity curve looks like a slow grind downward. The cure is not to improve the model β it is to detect the regime and reduce position size until directional persistence returns.
Pure mean reversion strategies β distinct from pair-based stat arb β operate on the assumption that individual instruments overshoot in the short term. Sharp moves driven by liquidity imbalance, stop cascades, or sentiment tend to partially reverse within one to five days.
The Ornstein-Uhlenbeck process we discussed in stat arb applies here too. But for single-instrument mean reversion, we are estimating reversion to a conditional mean β not a fixed historical average, but a dynamically estimated fair value based on recent volatility regime, order flow imbalance, and relative positioning.
The key input is the half-life of reversion. If a stock historically reverts to its five-day mean with a half-life of two days, we size the position accordingly and set our expected holding period. A longer half-life requires wider stops, larger risk per trade, and a smaller position size for a given volatility budget.
The cleanest mean reversion edge lives in microstructure: order book imbalance, bid-ask bounce, short-term over-extension in illiquid names. These are not strategies available at scale. But for smaller books, the edge is real and consistent precisely because it is too small to attract capital that would compress it.
The single most dangerous trade in systematic mean reversion is fading a genuine trend. The instrument looks extended β three standard deviations above its rolling mean. The model says sell. But the model doesn't know that the company just reported a transformative earnings beat, or that the sector is being re-rated by the market for structural reasons. Reversion strategies without a trend filter will systematically fade the strongest directional moves β which is where the largest losses concentrate.
Market making is not, strictly speaking, an alpha strategy. It is a service β providing liquidity to the market in exchange for the bid-ask spread. The edge is structural: market makers buy at the bid and sell at the offer, earning the spread on each completed round trip.
A market maker posts simultaneous bids and offers around the estimated fair value of an instrument. When the spread is filled on both sides, the market maker earns the difference. The risk is inventory: if the market maker buys and the price continues falling, they accumulate a losing long position that the spread income cannot offset.
Professional market making is therefore a continuous optimisation problem: how wide should the spread be (to cover adverse selection and earn a return), and how aggressively should inventory be managed when it builds in one direction?
The fundamental threat to a market maker is informed trading. When a large participant with genuine information buys from the market maker, the price will move against the market maker's resulting short inventory. The market maker has no way, in real time, to distinguish informed flow from noise flow. They can only estimate the probability of toxicity and widen their spread accordingly.
High-frequency market making mitigates this through speed: the ability to update quotes faster than adverse selection can accumulate. But as technology has democratised low-latency infrastructure, this edge has compressed. What remains is a thinner but still real premium for those willing to provide liquidity in less liquid instruments where others will not.
Market making strategies are implicitly short volatility. In calm markets, spreads are narrow, order flow is two-sided, and inventory manages itself. In volatile markets, spreads widen but so does adverse selection. Flash crashes and macro shocks can wipe out months of accumulated spread income in minutes. The strategy is not broken β but it requires hard position limits and the willingness to step back from the market entirely when volatility becomes disorderly.
Machine learning has changed quantitative finance. But not in the direction most people expected. The dominant narrative β that sufficiently large models trained on sufficiently large datasets would find hidden alpha β has largely not materialised. What has worked is more targeted: ML as a feature engineering and signal combination tool, not as an autonomous alpha generator.
Financial time series are non-stationary, low signal-to-noise, and subject to structural breaks. The conditions under which a model is trained are rarely the conditions under which it will operate. A model that learns the relationship between sentiment data and short-term returns during a bull market may learn nothing transferable about that relationship during a credit crisis.
The deeper problem is that markets are adaptive. When enough participants use the same ML features on the same datasets, the relationship between those features and returns is arbitraged away. The signal half-life of ML-derived features from commonly available alternative data β credit card transactions, satellite imagery, web scraping β has compressed from years to months.
The legitimate use cases are narrower but real. Regime classification: using ML to identify which of several historical market states the current environment most resembles, and adjusting strategy allocations accordingly. Natural language processing on earnings call transcripts and regulatory filings to extract forward guidance signals that are too embedded in context for keyword-based approaches to capture. Execution optimisation: learning the patterns in order book dynamics to minimise market impact on large trades.
These applications work because they use ML to solve well-defined problems with stable structural relationships β not to discover arbitrary patterns in noisy price data.
The one genuinely durable ML edge is proprietary data. A dataset that cannot be purchased from a vendor, derived from public sources, or reverse-engineered by a competitor is an actual moat. Firms with scale β the ability to generate their own data through trading activity, customer relationships, or physical infrastructure β can build ML models on signal sources that are permanently unobservable to others. For everyone else, the alpha clock starts ticking the moment the data vendor signs their second institutional client.
We have made all three of these. We are not describing them from the outside.
The most seductive failure in quantitative research is the backtest that tells a coherent story. The model found a pattern, the pattern has a plausible explanation, and the out-of-sample performance confirms it β because the "out-of-sample" period was selected after the researcher already knew what the in-sample pattern looked like.
The discipline required to prevent this is not algorithmic β it is procedural. Lock down the test set before touching the data. Enforce a strict separation between the period used to generate hypotheses and the period used to test them. And then apply a Bonferroni correction or equivalent adjustment for the number of strategies tested, because if we test a hundred strategies on the same data and pick the five that look best, we have not found five strategies with edge β we have found five random sequences that happened to produce positive returns on that particular dataset.
Every quant knows that regime change is a risk. Almost none of them build strategies that respond to it in real time. The reason is psychological: acknowledging that the current regime may have changed means acknowledging that the current position may be wrong β and reducing exposure based on a model that has just begun to underperform requires a level of process discipline that is very difficult to maintain when the drawdown is live and the pressure to hold is high.
The practical solution is to build regime detection into the strategy from the beginning: a separate layer that monitors the statistical properties of the recent environment β autocorrelation structure, volatility regime, cross-asset correlations β and scales down exposure when the regime no longer matches the strategy's design conditions. This is not a perfect solution. It adds lag and will cause the strategy to miss some recoveries. But it limits the catastrophic drawdowns that occur when a strategy designed for one regime runs at full size through another.
The third mistake is the one that affects the most intelligent researchers. There is a constant temptation to add complexity: more features, more layers, more parameters, more constraints. The model becomes elaborate. The backtested performance improves. The researcher feels, correctly, that the model now captures more of the reality of the market.
What the model has actually done is memorise the training data more completely. In an environment with low signal-to-noise ratios β which is every financial market β a simpler model with fewer parameters will almost always generalise better than a complex one. The research legacy of Occam's razor holds in finance as clearly as it holds anywhere: when two models explain the same empirical phenomenon, the simpler one is more likely to be capturing a genuine relationship.
The quants who have survived the longest are, without exception, the ones who can explain their entire strategy in two sentences and defend why the underlying logic will hold in a market that does not look like the backtest period.
Edge in markets is not a discovery. It is a temporary condition β a window in which a structural relationship, a behavioural bias, or an institutional constraint produces a systematic, exploitable pattern. That window closes. Sometimes slowly, through competitive pressure and capital replication. Sometimes overnight, through a regime shift that invalidates the model's core assumptions.
The hardest skill in systematic trading is not finding signals. It is building the discipline to reduce or eliminate exposure when the conditions that justify the signal are no longer present. This requires the researcher to be able to say, credibly, "the strategy was designed for conditions that no longer hold" β and act on that judgement before the strategy's equity curve confirms it.
The frameworks that age best are not the ones optimised for maximum historical performance. They are the ones designed with an explicit assumption of decay β strategies where the economic rationale is clear, the conditions under which the strategy should work are defined in advance, and the monitoring infrastructure is built to detect departure from those conditions in near real time.
We spent years building models that were sophisticated. The ones that are still running are not the sophisticated ones.
Markets are not puzzles with solutions. They are adaptive systems with temporary regularities. Every durable edge we have observed β across stat arb, trend, execution, mean reversion β has one thing in common: a genuine economic rationale that connects the signal to the return, independent of any particular historical period. When we could articulate that rationale clearly, the strategy had a chance of surviving regime change. When we could not β when the edge was in the pattern rather than the economics β it was only a matter of time.
The lesson is not that systematic trading is futile. It is that the models earn returns by being right about something structural, not by being clever about something historical. Getting that distinction clear, and building research culture around it, is the work that separates the quants still running strategies from the ones still talking about the one that almost worked.
Explore more writing on topics that matter.