Noam Brown@polynoamial

2026-04-11 03:51·83天前

AI 摘要

GTOWizard 测试显示，GPT-5.4、Claude Opus 4.6、Gemini 3.1 Pro、Grok 4 等主流模型在与专业扑克 AI 的 5000 手无限注德州扑克单挑中全部落败。推主调侃，既然直接玩扑克不行，不如测试 AI 生成会玩扑克的 AI 的能力。

What we really need is a benchmark where AI models make AI models that play poker.

GTOWizardWe benchmarked every major AI model at poker. GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4 and more. All played 5,000 hands of heads-up no-limit against our...

智能体 Meta 推理评测/基准

在 X 查看原推导出 Markdown

Noam Brown@polynoamial · X

导出 Markdown

2026-04-11 03:51·83天前

在 X 看原推· x.com

AI 摘要

What we really need is a benchmark where AI models make AI models that play poker.

GTOWizardWe benchmarked every major AI model at poker. GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4 and more. All played 5,000 hands of heads-up no-limit against our...

智能体 Meta 推理评测/基准

在 X 查看原推