Show HN: LeanRL: Fast PyTorch RL with Torch.compile and CUDA Graphs

We're excited to announce that we've open-sourced LeanRL, a lightweight PyTorch reinforcement learning library that provides recipes for fast RL training using torch.compile and CUDA graphs. By leveraging these tools, we've achieved significant speed-ups compared to the original CleanRL implementations - up to 6x faster!

Reinforcement learning is notoriously CPU-bound due to the high frequency of small CPU operations. PyTorch's powerful compiler can help alleviate these issues, but comes with its own costs. LeanRL addresses this challenge by providing simple recipes to accelerate your training loop and better utilize your GPU.

Key results: - 6.8x speed-up with PPO (Atari) - 5.7x speed-up with SAC - 3.4x speed-up with TD3 - 2.7x speed-up with PPO (continuous actions)

Why LeanRL?

- Single-file implementations of RL algorithms with minimal dependencies in the spirit of gpt-fast - All optimization tricks are explained in the README - no heavy doc, just simple tricks - Forked from the popular CleanRL library

Check out LeanRL on https://github.com/pytorch-labs/leanrl now!

Clean RL is a great library if you're looking to get started doing some deep reinforcement learning! That plus gymnasium are pretty standard.

It's good for the world if we keep publishing improvements and optimizations to understandable primitives.

I am curious why not contribute back upstream, though.

This is awesome Vincent, Tensordict x CleanRL x torch.compile is the most ambitious crossover

This looks awesome. CleanRL has been incredibly useful for some of my students starting out in RL. Adding Pytorch's compilation capabilities is a fantastic addition.

Very cool! How does the optimized Pytoch code compare to the Jax implementation?

Wow this looks clean (no pun intended). Great speedups as well!