AI 摘要
正在为RLHF书籍添加一个关于策略蒸馏的章节,值得注意的是,尽管我已经提供了核心论文和250页关于我如何阐述观点的背景资料,但LLMs/编码代理在这方面的表现却出奇地差。
Adding an on policy distillation section to the RLHF book and it's remarkable how bad LLMs / coding agents are at it, despite me giving them the core papers and 250 pages of context on how I present ideas.