I built a large language model "from scratch"

My wife and I are working through this book, and while I see its value and plan to finish it, I've found it (thus far) a bit underwhelming. It feels like a collection of Jupyter notebooks stitched together with a loosely edited narrative. Concepts are sometimes introduced without explanation, instructions lack context, and the growing errata list on Manning's website makes me question if I'm absorbing the right information.

Also went through this book recently and I agree with you. I found it difficult to build a working LLM using the code snippets included in the book.

Idk if I would ever bother reading a book on a field/subject that changes week by week, sometimes hour by hour.

The fundamentals are not changing that often so the knowledge is extremely applicable.

Worthwhile read. It helps to learn (or confirm) how things are working under the hood. What's interesting is that understanding how it all works makes it clear that all the hype around models "thinking" or being "sentient" is just marketing fluff for "the math works and it's really impressive how that translates to human-like cognition."

The interesting thing is how it becomes harder and harder over time to draw a line between what human brains do and what these models do. Once you have seen chain-of-thought reasoning in action (does the book even cover that?) you realize that defense of human superiority is strictly a "God of the gaps" game. Arguments that work today won't hold up tomorrow, because they're built on unreliable definitions, special pleading fallacies, obviously-temporary tech limitations, and general hand-waving.

It's not that the technology is magical or special, it's that we're not. That being said, finding new ways to study the nature and limits of cognition and consciousness after 2000+ years of unproductive navel-gazing feels very magical and special.

I've found none of the explanations of how LLMs are built have been satisfying, especially considering how impressive the applications of them are.

Karpathy's recent video[1] is quite good.

1. Deep Dive into LLMs like ChatGPT (https://youtube.com/watch?v=7xTGNNLPyMI)

It's a good introduction at the pop-sci level, but most technically-inclined HN'ers are probably going to get more benefit from his earlier Zero to Hero series.

Many thanks for this link.

To get the rough idea how things work, you can watch the Karpathy's series on YouTube. To get the actual understanding how things work, you will have to read through the papers. You probably want both. Finally, to really understand how it works then there's no better way than implementing an inference engine yourself. All other material I also found to be superficial and not satisfactory ... too much information at the hand-waving level.

Curious what's your questions that's really unanswered?

I'm still amazed by how intelligent the outcome is, after these number crunching processes. Really cannot relate its ability to generalize information to the theory behind it.

I don't see any description of the resulting model in the post. Or any results for that matter. Reads more like a book plug.

Am I missing something?

I get the same vibe, especially after reading the update where the book author contacts him to clarify stuff.

The whole post reads like hasty clumsy grey marketing.

I wrote the blog post, and I did it on my own freewill, and am receiving no compensation. The main reason I wrote it was to help cement my own learnings from the book. I've heard that the best way to learn something is to teach it, so I wanted to see how much I could regurgitate on my own. Turns out, not a whole lot. It was hastily written, and more of a "brain dump" than anything else. I'm entering a new-to-me field, and wanted a place to document the things I'm learning. If anyone finds it interesting, great. If not, no big deal.

As for the specifics of the model I trained, I would be hard pressed to recall the specifics off the top of my head. I believe I trained a small model locally, but after completing that as a PoC, I downloaded the GPT-2 model weights, then trained / fine-tuned those locally. That is what the book directed. All the steps are in my github repo, which (unsurprisingly) like the author's repo. His repo actually has more explanation. Mine is more or less just code.