161

Implementing LLaMA3 in 100 Lines of Pure Jax

It is a very bad idea to handle the KV cache in Jax naively like that. Jax requires static shapes. You're creating dynamic shapes there, causing a ton of recompilation.

3 days agoein0p

The blog mentions it's not for production use. This sounds like one thing you'd want to change.

I was curious what else made it not fit for production. Anything fundamental or just minor issues like this?

3 days agomagicalhippo

Is there any automatic way to get warned against these antipatterns?

3 days agobravura

you can see each compilation if you use JAX_LOG_COMPILES variable or you use low enough logging level.

3 days agosega_sai

Sorry, not to belabor this point.

Would that suggest to you what you did wrong? Or purely show you what you got right? How chatty is this variable?

3 days agobravura

I used this to see if something is repeatedly compiled. I.e. I have the code that runs in a loop and you immediately see if something is compiled only once, or every time. (and it produces a lot of output) I'm not saying this is the best way to do it though, it just worked for me.

3 days agosega_sai

Just don't use jit in generation and it would be fine. Of course there is some performance penalty but in my experience jit is oversold and the difference is something like ~10-30%.

Also in any case to get optimized code you need flash attention and many other tricks.

3 days agoYetAnotherNick

Unreadable in portrait mode on mobile. The text column is way too narrow, should be an easy fix!

3 days agoheyitsguay

People had long forgotten that mobile browsers handle wide content by zooming. If you are making a website but don't bother optimizing it for mobile, leave off the viewport <meta> element.

3 days agokccqzy

It's not just the width of the column - there are annotations on certain lines (that appear on a right "margin") that don't show up on mobile. I think that makes it not an easy fix, but to your larger point, this is not very mobile friendly. It looks quite good on a desktop though.

3 days agoabhgh
[deleted]
3 days ago

"focuses on the soul of pure functional programming which makes it more cool"

This is tangential to this post's main point but if you're trying for mass adoption this can go badly. Case in point, a hardware company I backed decided to write their code using Haskel like why "because it's cool" and now the people who are trying to modify/work with it have to deal with Haskell vs. a general purpose language like C++ idk...

edit: I also realize most of this code is python but yeah

3 days agoge96

> deal with Haskell vs. a general purpose language like C++

What's the actual problem? Company decided to use Haskell (which is also a general-purpose language) then hired people who don't know it?

If so, hire a bunch of Pythonistas to work on a Rails project and you'll have similar kind of struggles (and it won't mean that Python or Ruby are somehow bad, it'll be an almost entirely non-technical issue).

3 days agodrdaeman

the problem is it's intended to be an open source device so haskell would be harder to work on than something simpler like C++

again my point is about adoption, hence offering multiple languages in most products like stripe for ex

edit: it's alright, when they actually ship these things (after putting down $3.5K) I hope I will take it upon myself to port it to C++ myself

edit: "general purpose" is probably the wrong way to put it, Haskell is harder to read than C++ is my pov

3 days agoge96

If you know Haskell and don't know C++ then C++ will be harder to read. Haskell is definitely less widely used than C++, but that doesn't make it more complex.

2 days agoHasnep

Idk, they're different eg. imperative vs. declarative and that monad thing.

Still... I'm working with someone who came from a Swift background and thinks JavaScript is hard so that goes against my thought.

2 days agoge96

these anime kids are going to take everyones job

3 days agobrcmthrowaway

Anya from spy family x

3 days agoge96

said no one ever

3 days agorfl890

To the poster who wrote: "Hey Saurabh, will you be willing to teach me this on a call? I'm willing to pay for it (im not rich, so, dont expect much please). I will be having a lot of questions, mostly related to core concepts of transformers and jax in general."

This is the wrong way to ask for help.

Instead, consider offering your help and time apprenticing and learning along the way. Can't code that well? Write test cases and clean up. Or help blog writing. etc. You certainly have some valuable skill you could trade up.

3 days agobravura

I mean I’m no Saurabh but that didn’t seem to unreasonable to me? In fact I’ll put my money where my mouth is and offer half an hour for free just to spite you

3 days agosaagarjha
[deleted]
3 days ago

[flagged]

3 days agoharovoy

Cool blog