Nathan Lambert回应外界建议——他的《RLHF: Reinforcement Learning from Human Feedback》若改名“后训练”书籍会更畅销。Lambert承认内容本质正是后训练,但改名需重构3至15个月,因精力有限未做。他认为RLHF远未解决,值得独立成篇;该书侧重数学与直觉,后训练更偏数据与系统。他坚持原题以避免不诚实,并宣布“RLHF后训练书籍”即将出版。
I get feedback a lot that is like "your book should be the RL for LLMs book" or "the post-training book" and it's definitely true those would sell more copies.
The reality is that this book was in many ways a side project, and by the time I realized I agreed with a bit of this I didn't have the time for *another* refactor.
At the end of the day, I still dumped as much knowledge as I could from what I was doing into the book, and now the course and the code. In it's spirit the book is totally a post-training book.
The process to change this would've delayed the book from anywhere from 3 to 15 months. It is simply an amount of time I didn't have with Interconnects, Olmo, and other life necessities.