Elvis Saravia(DAIR.AI)指出,调优和构建LLM验证器及裁判(verifiers/judges)正成为高需求技能。他将这些组件用于自己的测试框架(harness),解锁了远超市面现有方案的智能体编码工作流。同时,引用案例显示,Bridgewater利用其金融专业知识,与Tinker API合作微调模型,帮助分析师聚焦关键任务,体现了“专家提升AI,AI赋能专家”的闭环。
So much alpha in tuning/building LLM verifiers and judges.
I use them on top of my harness, and it has unlocked agentic coding workflows that are beyond anything that exists in the market today.
Building verifiers and LLM judges is starting to become a skill in high demand.