Rohan Paul@rohanpaul_ai

2026-04-11 08:42·83天前

AI 摘要

KellyBench基准测试检验了主流LLM在英超赛季投注中的长期预测与风险管理能力。所有参测模型均遭遇亏损，部分资金归零。Claude Opus 4.6以-11% ROI表现最佳，GPT-5.4为-13.6%。该测试通过100-150场动态赛季模拟，暴露出现有AI在持续决策中的连贯性、数据适应性与风险控制方面存在显著缺陷。

People using AI for Premier League bets are losing badly.

A new betting benchmark suggests today's best AI models still unravel when prediction has to survive a whole season.

In KellyBench， every tested model lost money， and some went completely bust.

KellyBench forced agents through a changing 100-150 matchday season where they had to predict outcomes， size bets， and protect a £100，000 bankroll.

That setup tests something normal benchmarks miss： whether an LLM can stay coherent， adapt to new data， and manage risk over time.

Claude Opus 4.6 was best at -11% ROI， GPT-5.4 came next at -13.6%， and several models hit -100%.

智能体 Anthropic OpenAI 推理

在 X 查看原推导出 Markdown

Rohan Paul@rohanpaul_ai · X

导出 Markdown