The State of Reinforcement Learning for LLM Reasoning / hacker news