Greg Brockman@gdb

2026-07-01 13:33·1天前

AI 摘要

OpenAI 推出研究级基准 GeneBench-Pro，用于测试 AI 智能体在真实计算生物学中处理复杂、需要高度判断的分析能力。每个问题需要人类专家约 20-40 小时完成。Greg Brockman 表示，GPT-5.6 Sol 在该基准上实现了重大进步。

Introducing GeneBench-Pro - testing whether models can handle the kind of judgment-heavy analysis that real-world computational biology requires.

Problems would take a human expert around 20-40 hours to complete.

GPT-5.6 Sol is a big step forward.

OpenAIWe're introducing GeneBench-Pro, a research-level benchmark for a harder kind of AI progress: how well agents can navigate messy biological data, choose the rig...

智能体 OpenAI 论文/研究

在 X 查看原推导出 Markdown

Greg Brockman@gdb · X

56导出 Markdown