AI 摘要
动态工作流仅适用于少量用例,可视为测试时计算(TTC)新范式,对爬山式研究实验有效。仔细规划及提升推理级别均可改善效果。/goal + /loop 是其子集,验证者/评判者至关重要。结合不同编码智能体能获更好结果,适合需要多智能体视角的 LLM 评审团场景。前沿模型不擅即时生成 harnesses,但 Mythos 等新模型可能更优地处理智能体编排。TTC 基准尚缺,需建立。元提示动态工作流很有趣,Opus 4.8 也可能带来惊喜。动态工作流可打包为技能以便进一步优化。
Just had a great discussion on dynamic workflows.
Rough notes:
- applies to a very small set of use cases
- think of it as a new paradigm of (test-time compute) TTC
- strong for hill-climbing research experiments
- careful planning leads to better results
- you can often get better results by just increasing the reasoning level
- /goal + /loop is a subset of dynamic workflows
- verifiers/judges are crucial to get good results
- combine/fuse different coding agents for even better results
- great for when you need different perspectives from agents (llm council)