Research Sources

Academic papers and foundational research underpinning both the Sports Wizard and Stock Wizard systems.
1
Review of Financial Studies, 14(1), 79–111
Core empirical validation that insider open-market purchases predict positive abnormal returns. Establishes P-code (purchase) as the primary signal and documents officer vs. director differences.
2
Journal of Finance, 67(3), 1009–1043
Distinguishes routine from opportunistic insider trades. Shows that cluster buying — multiple insiders at the same company buying within a short window — generates the strongest forward alpha. Basis for the cluster_buys_30d feature.
3
Review of Economics and Statistics, 85(2), 453–471
Quantifies returns across different holding periods. Finds the bulk of alpha is captured within 6 months, with meaningful signal at shorter windows — motivating the 5–20 trading day holding period optimization.
4
Journal of Financial Economics, 16(2), 189–212
Foundational paper establishing the informativeness of Form 4 filings. Documents that senior officers (CEO, CFO) have the highest information advantage — basis for the is_ceo and is_cfo features.
5
Bell System Technical Journal, 35(4), 917–926
Mathematical framework for optimal position sizing under uncertainty. The allocation formula weights positions proportionally to model confidence, consistent with a Kelly-derived approach.
6
Journal of Portfolio Management, 40(5), 94–107 · SSRN
Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. When a strategy is chosen as the best of many backtested configurations, its Sharpe ratio is inflated even if all candidates are noise. The DSR deflates the observed Sharpe for the number of trials, sample length, and return skew/kurtosis. Basis for the Stock Wizard champion/challenger gate: a re-optimized policy only replaces the live one if its deflated Sharpe clears a high confidence bar.
1
Bell System Technical Journal, 35(4), 917–926
Defines the Kelly Criterion for maximizing long-run log-wealth. All Sports Wizard allocations use a fractional-Kelly approach: f* = (b·p − q) / b, scaled to reduce variance from model probability estimation error.
2
ICML 2017
Shows that modern ensemble classifiers are systematically overconfident and proposes temperature scaling as a post-hoc calibration fix. Directly motivates our calibration analysis: without calibration, edge calculations overstate true advantage.
3
Economic Journal, 103(420), 1141–1153
Establishes that bookmaker-implied probabilities reflect market consensus and information aggregation. Provides theoretical grounding for using book odds as the benchmark probability against which model edge is measured.
4
Economic Journal, 114(495), 223–246
Documents structural inefficiencies in sports betting markets, particularly that bookmakers shade prices toward popular outcomes rather than true probabilities — creating systematic edges for data-driven approaches.
5
Scandinavian Journal of Economics, 112(4), 802–826
Compares prediction accuracy across bookmaker formats and prediction exchanges. Finds systematic biases in bookmaker pricing that persist over time, validating the use of cross-book implied probability as edge detection signal.
1
Journal of Economic Perspectives, 18(2), 107–126
The foundational survey of when market prices aggregate information accurately — and when they don't. Thin participation and low liquidity degrade accuracy: the "swarm-failure" condition Hive scans the entire Kalshi exchange to find.
2
Economics Letters, 91(3), 425–429
Shows the market price is a biased estimate of the true probability, with the gap largest in the mid-range. Theoretical basis for Hive's nightly calibration map — re-deriving the price→true-rate curve and fading the systematically over-priced bands.
3
Econometrica, 53(6), 1315–1335
The canonical model of how informed traders move prices through order flow. Grounds Hive's whale-reaction and information-gap signals: large aggressive trades reveal information the resting book hasn't yet priced.
4
Journal of Financial Economics, 14(1), 71–100
Derives the bid-ask spread as compensation for adverse selection. Directly explains Hive's maker-vs-taker findings — why crossing the spread is structurally costly, and why resting depth (the markov execution) must price in being picked off by better-informed flow.
5
Journal of Political Economy, 118(4), 723–746
Establishes that longshots are systematically over-priced and favorites under-priced across betting and prediction markets. Basis for Hive's favorite–longshot family (buy_favorite, longshot_fade, the taker-favorite band).