Reinforcement Learning from Human Feedback

Related. Others?

Last time I saw Nathan say something about the book, he's actively working on the next version and looking for feedback, check his socials

You could say he's also learning from human feedback

Web version with links, etc:

Thanks! We've switched to that above from https://arxiv.org/abs/2504.12501, and put the latter in the toptext.

[dead]