97

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501

Last time I saw Nathan say something about the book, he's actively working on the next version and looking for feedback, check his socials

8 hours agoverdverm

You could say he's also learning from human feedback

6 hours agoleggerss

[dead]