97Reinforcement Learning from Human Feedbackhttps://arxiv.org/abs/2504.12501Related. Others?RLHF Book - https://news.ycombinator.com/item?id=42902936 - Feb 2025 (37 comments)Last time I saw Nathan say something about the book, he's actively working on the next version and looking for feedback, check his socialsYou could say he's also learning from human feedbackWeb version with links, etc:https://rlhfbook.com/Thanks! We've switched to that above from https://arxiv.org/abs/2504.12501, and put the latter in the toptext.[dead]
Last time I saw Nathan say something about the book, he's actively working on the next version and looking for feedback, check his socialsYou could say he's also learning from human feedback
Web version with links, etc:https://rlhfbook.com/Thanks! We've switched to that above from https://arxiv.org/abs/2504.12501, and put the latter in the toptext.
Thanks! We've switched to that above from https://arxiv.org/abs/2504.12501, and put the latter in the toptext.
Related. Others?
RLHF Book - https://news.ycombinator.com/item?id=42902936 - Feb 2025 (37 comments)
Last time I saw Nathan say something about the book, he's actively working on the next version and looking for feedback, check his socials
You could say he's also learning from human feedback
Web version with links, etc:
https://rlhfbook.com/
Thanks! We've switched to that above from https://arxiv.org/abs/2504.12501, and put the latter in the toptext.
[dead]