Nathan Lambert@natolambert

2026-06-27 06:44·6天前

AI 摘要

Nathan Lambert回应外界建议——他的《RLHF: Reinforcement Learning from Human Feedback》若改名“后训练”书籍会更畅销。Lambert承认内容本质正是后训练，但改名需重构3至15个月，因精力有限未做。他认为RLHF远未解决，值得独立成篇；该书侧重数学与直觉，后训练更偏数据与系统。他坚持原题以避免不诚实，并宣布“RLHF后训练书籍”即将出版。

I get feedback a lot that is like "your book should be the RL for LLMs book" or "the post-training book" and it's definitely true those would sell more copies.

The reality is that this book was in many ways a side project， and by the time I realized I agreed with a bit of this I didn't have the time for *another* refactor.

At the end of the day， I still dumped as much knowledge as I could from what I was doing into the book， and now the course and the code. In it's spirit the book is totally a post-training book.

The process to change this would've delayed the book from anywhere from 3 to 15 months. It is simply an amount of time I didn't have with Interconnects， Olmo， and other life necessities.

So this isn't to say that I'll never do it. Re-prints and new versions are a common thing. It's doable for me to refactor most of the chapters， re-write the introduction， and make it a post-training centric book.

Still， RLHF as a topic deserves a dedicated text and is far from solved. It's a technology that skyrocketed language models to prominence and points to a lot of fundamental problems interfacing the user and the AI.

Much of the content that got me to where I am today in my career is by diving into caring about this interface， so I'm happy for it to have the space to live， breath and thrive.

So in reality， I probably could've hot-swapped the title to sell more copies， but it would have made me feel dishonest to do so. For anyone wanting to learn post-training， there's nothing in this book that doesn't apply to you -- post-training is just constantly evolving and growing in complexity.

A final nitpick， is that RLHF actually matches my more conceptual， intuitive vibe a good amount. Post-training is far more practical， in a data and systems sense， where this is more of a math & intuition book.

Anyways， the RLHF "post-training" Book is coming soon and thank you for trusting me with your attention. 🩵

Nathan Lambert@natolambert · X

43导出 Markdown