One of the most common mistakes in systematic trading research is running too few ideas with too much depth. Researchers pick a strategy that seems promising, spend three months building it, and then discover it does not work. By that point, they have invested too much to abandon it easily, so they over-optimize until the backtest looks acceptable and deploy something that was never viable.
The correct approach is the opposite: run many ideas with minimal depth until you find one that passes basic viability filters, then invest the depth.
We maintain a library of 80 trading theses. Over time, these have produced 7 deployed strategies. Here is the pipeline that turned 80 ideas into 7 viable systems — and specifically, the process we use to kill ideas quickly and cheaply.
The Thesis Library
A trading thesis is a one-paragraph statement of a belief about market behavior that should be exploitable if true. It is not a strategy — it is a hypothesis about a market inefficiency or structural feature.
Some examples from our library:
Thesis #3 (Funding Rate Mean Reversion): When perpetual futures funding rates spike above 0.05% per 8-hour period, the market is overleveraged. Longs are paying an extreme fee to maintain positions. This creates mechanical pressure to unwind that tends to resolve within 24 hours, often violently. A strategy that shorts this pressure is systematically on the right side of the liquidation cascade.
Thesis #1 (Cross-Asset Dislocation): When BTC moves dramatically in a short window (>5% in 1 hour), altcoins temporarily decouple from their normal BTC correlation. The decoupling reflects the time it takes for information to propagate through the altcoin order books. A strategy that trades the reversion to normal correlation within 6–24 hours captures this temporary mispricing.
Thesis #43 (Volatility Compression Spring): When Bollinger Band width reaches a 3-month low while On-Balance Volume is rising, institutional accumulation is occurring during a flat consolidation period. The upcoming breakout releases stored energy. This is a tradeable signal for the direction and magnitude of the next move.
Each thesis in our library is evaluated by the same structured pipeline, in the same order, with the same binary go/no-go criteria at each gate. No thesis receives special treatment based on how exciting the idea sounds.
The 6-Stage Pipeline
Stage 1: Thesis Clarity Check (15 minutes)
Before any code is written, the thesis must answer four questions clearly:
- What is the specific market inefficiency being exploited?
- What observable data makes this inefficiency detectable?
- What would falsify this thesis? (What would prove it does not work?)
- Is there a plausible mechanism for why this inefficiency exists and persists?
If any of these four questions cannot be answered concisely, the thesis is not ready for testing. Vague theses produce spurious results — you cannot tell if the strategy works because the thesis is right or because the parameters were overfit to historical noise.
Kill rate at Stage 1: ~25% of theses. Ideas that sound compelling in a headline ("altcoins always moon after ETF approvals") often collapse when pressed on specifics.
Stage 2: Quick Feasibility Backtest (2–4 hours)
A minimal implementation of the thesis — often 50–100 lines of code — is run against 12 months of historical data on a small subset of symbols. No parameter optimization. No transaction cost modeling. No risk management.
The question at this stage is: does the underlying effect exist at all?
Success criterion: average trade return above zero with at least 30 trades in the sample. This is a deliberately low bar. The goal is to confirm the signal exists before investing more time.
If the quick backtest shows negative average trade returns, the thesis is killed. If it shows zero (trades are random), the thesis is killed. If it shows positive average trade returns, even small ones, proceed.
Kill rate at Stage 2: ~40% of surviving theses. Many ideas that sound mechanistically plausible simply do not show up in the data. Funding rate mean reversion passes this test easily; whale wallet accumulation as a leading indicator fails consistently — the data shows whales often accumulate during downtrends and the "3-30 day follow-through" is not observable.
Stage 3: Structured Parameter Exploration (1–2 days)
If the quick backtest finds the effect, Stage 3 explores whether any parameterization of the strategy achieves our baseline viability threshold: Sharpe ratio above 1.5 on out-of-sample data.
We test 10 to 20 meaningful parameter combinations, chosen based on the thesis rather than grid-searched randomly. For a mean reversion strategy, meaningful parameters are the z-score entry threshold, the exit criterion, and the holding period. We do not vary the moving average length from 10 to 200 in increments of 5 — that is not parameter exploration, it is curve-fitting.
Each parameter set is run on four distinct market regime periods (a bull period, a bear period, a choppy/ranging period, and the most recent 3-month period). A viable strategy must show Sharpe above 1.0 in at least three of the four periods, not just in the best one.
Kill rate at Stage 3: ~35% of surviving theses. The combination of the Sharpe 1.5 threshold and the multi-regime consistency requirement eliminates most strategies that only work in specific market conditions. The Percentile Rank Reversal thesis (assets at the bottom 10% by 7-day return tend to revert) passes the quick backtest but consistently fails to achieve Sharpe 1.5 in bull regimes — it is regime-conditional rather than regime-robust.
Stage 4: Walk-Forward Validation (1 week)
The strategy that passed Stage 3 is now run through a rigorous walk-forward validation, matching the methodology described in our feature engineering post. This is the most expensive stage in terms of compute time but also the most important.
Walk-forward validation with monthly retraining windows tests whether the strategy's edge is persistent across different market environments or whether it was concentrated in the specific historical windows used in Stage 3.
We also add proper transaction cost modeling at this stage: 10 bps per trade baseline, plus a 2x stress test at 20 bps, plus a 3x stress test at 30 bps. A strategy that only works at near-zero transaction costs is not deployable.
Kill rate at Stage 4: ~30% of surviving theses. Strategies that appeared robust in Stage 3's exploratory backtests often show significant degradation in walk-forward mode. The degradation is usually the result of hidden regime dependency — the Stage 3 sample coincided with favorable conditions that are not representative of the full historical distribution.
Stage 5: Paper Trading Validation (3 months minimum)
The strategy that survives walk-forward validation enters paper trading: it generates live signals against real market data, but positions are not executed with real capital. The paper trading system records every signal, tracks what would have happened, and computes live-trading performance metrics.
Paper trading validates assumptions that backtests cannot fully capture: - Execution assumptions: Does the signal actually fire when expected? Do theoretical entry prices match achievable prices? - Data quality: Do the live data feeds produce the same feature values as the historical data? - Latency effects: Does the timing of signal generation create any implementation gap that affects performance? - Regime stability: Does the edge persist in the current market regime, or was the walk-forward validation anchored on a historical period that is no longer representative?
We require at least 3 months of paper trading with Sharpe above 1.0 and performance broadly consistent with the walk-forward results before considering live deployment.
Kill rate at Stage 5: ~20% of surviving theses. Paper trading failures are usually either execution-quality issues (the strategy works theoretically but cannot be executed at the assumed prices) or regime breaks (the strategy worked historically but the current market is structurally different).
Stage 6: Capital Deployment
A strategy that survives paper trading is deployed with real capital. Initial allocation is conservative — typically 5–10% of the portfolio — with the intention to scale based on live performance.
Live performance is monitored monthly against the walk-forward and paper trading benchmarks. A strategy that performs significantly below expectations triggers a review. Significant underperformance relative to the paper trading period (more than 2 Sharpe points below) triggers a pause and re-audit.
The 80-to-7 Funnel
| Stage | Strategies Entering | Kill Rate | Strategies Exiting |
|---|---|---|---|
| Thesis library | 80 | — | 80 |
| Stage 1: Clarity check | 80 | 25% | 60 |
| Stage 2: Quick backtest | 60 | 40% | 36 |
| Stage 3: Parameter exploration | 36 | 35% | 23 |
| Stage 4: Walk-forward | 23 | 30% | 16 |
| Stage 5: Paper trading | 16 | 20% | ~13 |
| Stage 6: Deployed | ~13 | Additional culls | **7 active** |
The approximate 11% throughput rate (7 of 80 ideas become deployed strategies) is about right. A higher rate would suggest the filters are too lenient. A lower rate would suggest the initial thesis generation is poor quality.
What Gets Killed Most Often
Stage 2 kills: Theses that do not show up in the data. "On-chain metrics predict price" is an entire category of popular ideas that consistently fail Stage 2 testing. Whale wallet movements, exchange inflows/outflows, and holder distribution metrics have weak predictive power at the timeframes we trade (12–72 hours).
Stage 3 kills: Strategies that only work in one regime. The single most common failure mode is a mean reversion strategy that works beautifully in ranging/choppy markets and loses badly in trending markets. Without regime conditioning that can reliably distinguish the two environments, these strategies are not deployable.
Stage 4 kills: Strategies whose in-sample performance was driven by a concentrated period. Several strategies showed excellent 2021–2022 performance in Stage 3 that evaporated in walk-forward mode because 2021's DeFi summer was not representative of anything beyond itself.
The Thesis Library Is Always Growing
We add theses to the library continuously — from our own research, from academic papers, from observing market microstructure patterns we cannot explain, and from combining elements of theses that individually failed but might work in combination.
Most theses in the library will never be tested beyond Stage 1. The pipeline's purpose is not to test every idea — it is to prevent any idea from consuming disproportionate research time before proving it deserves it.
Takeaways
- An 11% throughput rate from thesis to deployment is healthy — expect most ideas to fail, and design your research process to kill them cheaply
- Stage 2 quick backtests (2–4 hours, no optimization) are the highest-ROI research activity: they confirm or deny the existence of an effect before any further investment
- Multi-regime consistency is more important than average performance — a strategy that only works in bull markets is not a strategy, it is a beta position with extra steps
- Walk-forward validation with monthly retraining is non-negotiable; in-sample optimization produces results that do not survive live deployment
- Paper trading catches execution and data issues that backtests cannot — minimum 3 months before capital commitment
- Transaction cost stress testing at 2x and 3x baseline should be standard practice; strategies that only work at zero costs are fragile