GENEB：为什么基因组模型难以比较

2026-06-03 08:00·30天前

AI 摘要

基因组基础模型的进展因基准碎片化、评估协议不兼容而难以评估。GENEB是一个大规模诊断基准，在统一探针协议下评估40个模型在100个任务（13个功能类别）上的冻结表示，包含少样本场景。分析显示聚合排行榜不稳定：模型排名在不同任务类别间差异显著，规模带来的收益有限且不一致，架构和预训练对齐的影响常超过参数数量。GENEB为基因组机器学习提供了原则性比较和类别感知模型选择的参考框架。

原文 · 未翻译

Progress in genomic foundation models is difficult to assess due to fragmented benchmarks, incompatible evaluation protocols, and task-specific reporting. As a result, claims of superiority or generality across models are often not directly comparable. We introduce GENEB, a large-scale diagnostic benchmark that evaluates frozen representations from 40 genomic foundation models across 100 tasks spanning 13 functional categories under a unified probing-based protocol, including few-shot regimes. GENEB enables controlled comparison across model scale, architecture, tokenization, and pretraining data while explicitly exposing task-level trade-offs. Our analysis shows that aggregate leaderboards are unstable: model rankings vary sharply across task categories, scale provides only modest and inconsistent gains, and architectural and pretraining alignment frequently outweigh parameter count. These results highlight limitations of current evaluation practices and position GENEB as a reference framework for principled comparison and category-aware model selection in genomic machine learning.

HuggingFace Daily Papers（社区热门论文）

49导出 Markdown

GENEB：为什么基因组模型难以比较

2026-06-03 08:00·30天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译

GENEB： 为什么基因组模型难以比较

GENEB： 为什么基因组模型难以比较

GENEB：为什么基因组模型难以比较

GENEB：为什么基因组模型难以比较