Research Sources

Academic papers and foundational research underpinning both the Sports Wizard and Stock Wizard systems.

Stock Wizard — Insider Trading Signal

Lakonishok & Lee (2001) — Are Insider Trades Informative?

Review of Financial Studies, 14(1), 79–111

Core empirical validation that insider open-market purchases predict positive abnormal returns. Establishes P-code (purchase) as the primary signal and documents officer vs. director differences.

Cohen, Malloy & Pomorski (2012) — Decoding Inside Information

Journal of Finance, 67(3), 1009–1043

Distinguishes routine from opportunistic insider trades. Shows that cluster buying — multiple insiders at the same company buying within a short window — generates the strongest forward alpha. Basis for the cluster_buys_30d feature.

Jeng, Metrick & Zeckhauser (2003) — Estimating the Returns to Insider Trading

Review of Economics and Statistics, 85(2), 453–471

Quantifies returns across different holding periods. Finds the bulk of alpha is captured within 6 months, with meaningful signal at shorter windows — motivating the 5–20 trading day holding period optimization.

Seyhun (1986) — Insiders' Profits, Costs of Trading, and Market Efficiency

Journal of Financial Economics, 16(2), 189–212

Foundational paper establishing the informativeness of Form 4 filings. Documents that senior officers (CEO, CFO) have the highest information advantage — basis for the is_ceo and is_cfo features.

Kelly (1956) — A New Interpretation of Information Rate

Bell System Technical Journal, 35(4), 917–926

Mathematical framework for optimal position sizing under uncertainty. The allocation formula weights positions proportionally to model confidence, consistent with a Kelly-derived approach.

Bailey & López de Prado (2014) — The Deflated Sharpe Ratio

Journal of Portfolio Management, 40(5), 94–107 · SSRN

Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. When a strategy is chosen as the best of many backtested configurations, its Sharpe ratio is inflated even if all candidates are noise. The DSR deflates the observed Sharpe for the number of trials, sample length, and return skew/kurtosis. Basis for the Stock Wizard champion/challenger gate: a re-optimized policy only replaces the live one if its deflated Sharpe clears a high confidence bar.

Sports Wizard — Prop Betting Edge

Kelly (1956) — A New Interpretation of Information Rate

Bell System Technical Journal, 35(4), 917–926

Defines the Kelly Criterion for maximizing long-run log-wealth. All Sports Wizard allocations use a fractional-Kelly approach: f* = (b·p − q) / b, scaled to reduce variance from model probability estimation error.

Guo, Pleiss, Sun & Weinberger (2017) — On Calibration of Modern Neural Networks

ICML 2017

Shows that modern ensemble classifiers are systematically overconfident and proposes temperature scaling as a post-hoc calibration fix. Directly motivates our calibration analysis: without calibration, edge calculations overstate true advantage.

Shin (1993) — Measuring the Incidence of Insider Trading in State-Contingent Claim Markets

Economic Journal, 103(420), 1141–1153

Establishes that bookmaker-implied probabilities reflect market consensus and information aggregation. Provides theoretical grounding for using book odds as the benchmark probability against which model edge is measured.

Levitt (2004) — Why Are Gambling Markets Organised Differently from Financial Markets?

Economic Journal, 114(495), 223–246

Documents structural inefficiencies in sports betting markets, particularly that bookmakers shade prices toward popular outcomes rather than true probabilities — creating systematic edges for data-driven approaches.

Franck, Verbeek & Nüesch (2010) — Prediction Accuracy of Different Market Structures

Scandinavian Journal of Economics, 112(4), 802–826

Compares prediction accuracy across bookmaker formats and prediction exchanges. Finds systematic biases in bookmaker pricing that persist over time, validating the use of cross-book implied probability as edge detection signal.

Hive — Prediction Markets

Wolfers & Zitzewitz (2004) — Prediction Markets

Journal of Economic Perspectives, 18(2), 107–126

The foundational survey of when market prices aggregate information accurately — and when they don't. Thin participation and low liquidity degrade accuracy: the "swarm-failure" condition Hive scans the entire Kalshi exchange to find.

Manski (2006) — Interpreting the Predictions of Prediction Markets

Economics Letters, 91(3), 425–429

Shows the market price is a biased estimate of the true probability, with the gap largest in the mid-range. Theoretical basis for Hive's nightly calibration map — re-deriving the price→true-rate curve and fading the systematically over-priced bands.

Kyle (1985) — Continuous Auctions and Insider Trading

Econometrica, 53(6), 1315–1335

The canonical model of how informed traders move prices through order flow. Grounds Hive's whale-reaction and information-gap signals: large aggressive trades reveal information the resting book hasn't yet priced.

Glosten & Milgrom (1985) — Bid, Ask and Transaction Prices in a Specialist Market

Journal of Financial Economics, 14(1), 71–100

Derives the bid-ask spread as compensation for adverse selection. Directly explains Hive's maker-vs-taker findings — why crossing the spread is structurally costly, and why resting depth (the markov execution) must price in being picked off by better-informed flow.

Snowberg & Wolfers (2010) — Explaining the Favorite-Longshot Bias

Journal of Political Economy, 118(4), 723–746

Establishes that longshots are systematically over-priced and favorites under-priced across betting and prediction markets. Basis for Hive's favorite–longshot family (buy_favorite, longshot_fade, the taker-favorite band).