55

Running a One Trillion-Parameter LLM Locally on AMD Ryzen AI Max+ Cluster

Cool that it's possible but basically unusable performance characteristics. For an 8192 token prompt they report a ~1.5 minute time-to-first-token and then 8.30tk/s from there. For context ChatGPT is typically <<1s ttft and ~50tk/s.

5 hours agoibeckermayer

That’s pretty awesome!

Though only 5gig Ethernet? Can’t they do usb-c / thunderbolt 40 Gb/s connections like Macs?

4 hours agoelcritch

I really wonder if AMD is going to keep getting walloped on the interconnect or if they'll start upping what's available to consumers, at some point.

9 minutes agojauntywundrkind

I set up ollama today and can barely run a 3b parameter model before the lag makes it unbearable.

How much is one of these gonna run me?

4 hours agotills13

I've been pretty happy with my Framework Desktop, though I managed to snag it before RAM prices shot through the roof. Currently, a tricked out model is around $2500.

https://frame.work/desktop

Mine sees more use as a Steam machine, but it can run decently large models. Ollama was trivial to get working, and qwen3-coder-next spits out paragraphs of text/code in seconds. I don't really do anything with that, but it's fun to mess around with. (LLMs are still pretty bad at assembly language.)

an hour agozeta0134
[deleted]
3 hours ago

You can buy a 128GB mainboard from framework for $2300, so maybe somewhere a bit over $9k by the time you've got power, storage, cables, racks (they sell those too). I was thinking about getting into one of these Strix Halo setups but decided to go a slightly different route with a lot higher TDP, better throughput, and a bit less VRAM.

https://frame.work/products/framework-desktop-mainboard-amd-...

an hour agojcgrillo

The setup was around $10k, but maybe more now with mem/ssd prices.

This is a good list, I like my Beelink a lot, my Minisforum likes to turn itself off every couple of weeks, not sure why yet.

https://www.techradar.com/pro/there-are-15-amd-ryzen-ai-max-...

---

Performance is pretty bad (<10/tps) and context is quite limited. Still good to see progress

Prompt Size (tokens) | TFT (s) - Flash Attention Disabled | TFT (s) - Flash Attention Enabled

4096 | 53.7s | 39.7s

8192 | Out Of Memory (OOM) | 90.5s

16384 | Out Of Memory (OOM) | 239.1s

5 hours agoverdverm

I would try to add a fan or other cooling; my guess is that the CPU is handling thermal properly but something else is not.

44 minutes agoshrubble

> Minisforum likes to turn itself off every couple of weeks, not sure why yet

AFAICT, the answer is "because Minisforum". I don't know if they have a design principle that they should run their systems near the edge of the thermal envelope or what, but Minisforum is the only brand I've had consistent trouble with stability on. My last one got to where it stopped booting altogether, just looped. Since then I've written off Minisforum as a brand, just not worth the hassle.

2 hours agorootusrootus

Framework has gone fully in the tank of Apple consumerization route of unrepairability and unupgradeability with a nonstandard machine, soldered-on RAM, and no meaningful PCIe slots. There's only the superficial appearance of longevity and future-proofness when it's really yet another silo. There's no way to add an IB, FC, or 100/400 GbE NICs to these machines. 5 GbE is a joke. Non-ECC RAM is a joke.