So for people wondering if it can be used to accelerate LLM inference, sadly not.
I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.
It appears to be focussed more on latency, than throughput. Happy to be corrected?
You're correct that this work is not very applicable for LLMs and that the focus here is primarily on latency.
Right. But ... this would limit you to either extremely small models or extremely large FPGA's, yes? If there's a simple machine learning task that requires a sub microsecond latency I can see the point but otherwise??
Yes, this work is focused on accelerating very small models, typically for real-time systems that require extremely low power or low latency.
I'm not in HFT, but I assume this is also an interesting applicable domain?
The author actually works at Jane Street.
Yes, definitely: this type of work is applicable in domains where software run on general-purpose processors cannot meet latency or power requirements.
[deleted]
Happy to hear that KANs continue to find solid footing.
This guy will be hired by a high-frequency trading firm, and the next time we hear about him, he will have a net worth in 9 figures.
So for people wondering if it can be used to accelerate LLM inference, sadly not.
I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.
It appears to be focussed more on latency, than throughput. Happy to be corrected?
You're correct that this work is not very applicable for LLMs and that the focus here is primarily on latency.
Right. But ... this would limit you to either extremely small models or extremely large FPGA's, yes? If there's a simple machine learning task that requires a sub microsecond latency I can see the point but otherwise??
Yes, this work is focused on accelerating very small models, typically for real-time systems that require extremely low power or low latency.
One primary application of this work is in high-energy physics (https://home.cern/smarter-decisions-at-the-speed-of-collisio...). Ultrafast and real-time learning is also very applicable for problems in quantum computing, plasma control, etc. (https://arxiv.org/pdf/2602.02005).
I'm not in HFT, but I assume this is also an interesting applicable domain?
The author actually works at Jane Street.
Yes, definitely: this type of work is applicable in domains where software run on general-purpose processors cannot meet latency or power requirements.
Happy to hear that KANs continue to find solid footing.
This guy will be hired by a high-frequency trading firm, and the next time we hear about him, he will have a net worth in 9 figures.
he is already at Jane Street
Of course.
[dead]
Archive link, as it looks like the original post was taken down: https://web.archive.org/web/20260609200156/https://aarushgup...
Hmm the post is still up for me?
For us too, but we'll put the archive link in the toptext since these things seem to vary a lot by region.
p.s. Thanks for posting this and welcome to HN!
[dead]