I will confess to skimming by the end. But I don’t think they explained how they solved the cache issue except to say they rewrote the software in Rust, which is pretty vague.
Was all the code they rewrote originally in Lua? So was it just a matter of moving from a dynamic language with pointer-heavy data structures to a static language with value types and more control over memory layout? Or was there something else going on?
The gains in lower memory footprint and lower demands on memory bandwidth from rewriting stuff to Rust are very real, and they're going to matter a lot with DRAM prices being up 5x or more. It doesn't surprise me at all that they would be getting these results.
I guess I am a sucker for stories about redesigning data structures, and I'd have liked more detail on that front. Also, they talked about Rust's greater memory safety; it would have been nice to know whether there were specific language features that played into the cache difference or whether it just made the authors comfortable using a systems language in this application and that made the difference.
It seems like the unspoken takeaway is just how shockingly performant LuaJIT is, even relative to Rust.
Proves the point that if one isn't reaching for Cloudflare performance levels, maybe that JIT in "pick your lang" is already more than adequate for the work being done, no need for RIIR.
Which is kind of lost in the days that everyone wants to be the next unicorn.
They said they're replacing 15 years of Nginx+Lua, that's a testament to how good it can be.
Is the Linux scheduler aware of shared CPU cache hierarchies? Is there any way we could make the scheduler do better cache utilization rather than pinning processes to cores or offloading these decisions to vendor specific code?
That was annoying to read because there is no easy to see impact of each change.
It's FL2 + Gen 13 combined.
I.e. what's the FL2 benchmark on Gen 12 compared to FL1?
I'm missing comparison of FL2 on Gen13 vs Gen12, since this would be a real win (or loss?) from the hardware upgrade. How can they justify the upgrade without this data?
Well, that's because this post is about Gen 13. In FL2 post (presumably on same Gen 12 servers), they say 25% lower latency.
It all should be tuned with an AMD CPU expert, and programmer adjusting code under their guidance to leverage all CPU features.
Did AMD engineers or seasoned hardware experts from server vendor assist in this implementation?
Were the "Nodes Per Socket", "CCX as NUMA", "Last Level Cache as NUMA" settings tested/optimized? I don't see them mentioned in the article. They can make A LOT of difference for different workloads, and there's no single setting/single recommendation that would fit all scenarios.
"The locality of cores, memory, and IO hub/devices in a NUMA-based system is an important factor when tuning for performance” - „AMD EPYC 9005 Processor Architecture Overview” page 7
What was the RAM configuration? 12 DIMM modules (optimal) or 24 (suboptimal)?
Was the virtualization involved? If so, how was it configured? How does bare metal performance compare to virtualized system for this specific code?
So many opportunities to explore not mentioned in the text.
Epyc’s naming is beautiful and consistent
This post sponsored by AMD®.
Someday someone will deploy CXL
Reminds that time when cheap Celeron with small cache was beating expensive Pentium with large cache (if i remember correctly that Celeron's cache was running at the core frequency while Pentium's was a separate die on half-frequency, and Celeron was very overclockable)
Lower cache per core is actually a pretty natural outcome with the latest device fabrication nodes shrinking logic while leaving the size of SRAM largely unchanged. We may perhaps also see eDRAM (a lot denser than SRAM) for last-level caches.
Pentium 4 the first release. AMD had the same gimmick with their Phenoms.
> trading cache for cores
Viva el Celeron
> for 2x performance
You wish.
Are people at Coudfare so young that they didn't heard about Celeron and Duron ?
The tradeoff: The opportunity: Proving it out:
Nah I'm good thanks. Slop takes more effort to read and just raises questions of accuracy. It's just disrespectful to your reader to put that work on them. And in a marketing blog post it's just a bad idea.
Cloudflare has excellent (human) technical writers. I don’t see any indication this is “slop”, it’s the standard in-the-weeds but understandable blog post they’ve been doing for years.
AI text is everywhere, but this isn’t it.
Cloudflare definitely does have excellent technical writers, but a) this doesn't seem to be (entirely? substantially?) from them, and b) if there are AI tropes clearly visible, which they are to me, it's putting readers off regardless of whether the content is AI generated, and that's just bad marketing.
Cloudflare had excellent human technical writers. But for the past months/years they got slowly replaced by AI, and the quality of the posts dove down drastically.
Remember when they had "implemented a serverless post-quantum Matrix server", where they blatantly lied saying it's production ready, when most of the encryption features were not even implemented. (Then rushed to removed the LLM's 'todo' tags from the code).
https://tech.lgbt/@JadedBlueEyes/115967791152135761
This is AI, but I can’t prove it, lol. :) The bulleted lists that are too short both in total list length and text per list item length. littles drama headers as parent noted.
To your point, this would register as “human bloviating for word count subtly” if llms didn’t exist, and at this point is probably the most useful feedback. I doubt it’s 100% one-shot AI, but someone definitely optimized it in parts, but the AI heard “concise” as “bullets and short sentences.”
Agree to disagree. It is likely ai enhanced some where along the path to production. So many phrases reek AI but others do not. Is this a sprinkling of llm help or how a human genuinely writes, idk.
Out of curiosity, can you point to specific sections that reel of AI? I read the article and didn't see anything that immediately stuck out, but maybe I need to start looking for different signals.
This is LLM tropey:
> For our FL1 request handling layer, NGINX- and LuaJIT-based code, this cache reduction presented a significant challenge. But we didn't just assume it would be a problem; we measured it.
It’s not that the “it’s not just A, it’s B” pattern. It’s just that humans don’t write like that. You don’t go “I didn’t just assume; I measured”. People usually say something like “but we didn’t know for sure, so we decided to measure it”. The LLM text is over-confident.
> The choice to never invert raster images isn't a compromise, it's the design decision. The problem veil solves is exactly that: every dark mode reader today inverts everything, and the result on photos, histology, color charts, scans is unusable. Preserving all images is the conservative choice, and for my target (people reading scientific papers, medical reports, technical manuals) it's the right one.
It’s like a guy putting together a promo packet or something. A normal person would be a little hesitant and wouldn’t just go. “And what I’m doing isn’t because of constraints. It’s because I am making the right choices!”
It’s just an oddly stilted way of speaking in conversation. Imagine talking to someone like that in real life. It would be all like “And then I thought the problem was that the global variable was set wrong. But I didn’t just assume that, I verified it.”
No one’s accusing you of assuming it, dude. You don’t have to pre-emptively tell us you didn’t just assume it. Normal people don’t say that.
I don’t have much of a problem with LLM text because I just skip over flavor like this to charts, code, and tables but this is obviously LLM
Ah appreciate it. A year ago it was very clear when something was written by an LLM, but now you've gotta look for certain characteristics. I try not to infer to much, especially because llms are really helpful for non native English speakers to write faster.
I'd like to make it a bit more normalized to have public writing be transparent about if llms were used and how. That makes it quite a bit easier for readers to focus on the content instead of debating how something was written lol
“ deliver more than just a core count increase. The architecture delivers improvements across multiple dimensions”
“But we didn't just assume it would be a problem; we measured it.”
“ Instead of compromising, we built FL2. ”
Idk if i am now seeing this pattern everywhere because it is all AI slop or if people really do write this way.
Skimming it, this looks like they got a partnership with amd and tacked it onto an ongoing project as if it were planned. This confuses us as it makes it harder to understand how much was the rewrite generally or was it some hardware thing? Man, i used to really enjoy cloudflares technical blogs.
I will confess to skimming by the end. But I don’t think they explained how they solved the cache issue except to say they rewrote the software in Rust, which is pretty vague.
Was all the code they rewrote originally in Lua? So was it just a matter of moving from a dynamic language with pointer-heavy data structures to a static language with value types and more control over memory layout? Or was there something else going on?
The gains in lower memory footprint and lower demands on memory bandwidth from rewriting stuff to Rust are very real, and they're going to matter a lot with DRAM prices being up 5x or more. It doesn't surprise me at all that they would be getting these results.
I guess I am a sucker for stories about redesigning data structures, and I'd have liked more detail on that front. Also, they talked about Rust's greater memory safety; it would have been nice to know whether there were specific language features that played into the cache difference or whether it just made the authors comfortable using a systems language in this application and that made the difference.
[flagged]
They posted about the Rust rewrite last year. https://blog.cloudflare.com/20-percent-internet-upgrade/
It seems like the unspoken takeaway is just how shockingly performant LuaJIT is, even relative to Rust.
Proves the point that if one isn't reaching for Cloudflare performance levels, maybe that JIT in "pick your lang" is already more than adequate for the work being done, no need for RIIR.
Which is kind of lost in the days that everyone wants to be the next unicorn.
They said they're replacing 15 years of Nginx+Lua, that's a testament to how good it can be.
Is the Linux scheduler aware of shared CPU cache hierarchies? Is there any way we could make the scheduler do better cache utilization rather than pinning processes to cores or offloading these decisions to vendor specific code?
That was annoying to read because there is no easy to see impact of each change. It's FL2 + Gen 13 combined.
I.e. what's the FL2 benchmark on Gen 12 compared to FL1?
I'm missing comparison of FL2 on Gen13 vs Gen12, since this would be a real win (or loss?) from the hardware upgrade. How can they justify the upgrade without this data?
Well, that's because this post is about Gen 13. In FL2 post (presumably on same Gen 12 servers), they say 25% lower latency.
It all should be tuned with an AMD CPU expert, and programmer adjusting code under their guidance to leverage all CPU features.
Did AMD engineers or seasoned hardware experts from server vendor assist in this implementation?
Were the "Nodes Per Socket", "CCX as NUMA", "Last Level Cache as NUMA" settings tested/optimized? I don't see them mentioned in the article. They can make A LOT of difference for different workloads, and there's no single setting/single recommendation that would fit all scenarios.
"The locality of cores, memory, and IO hub/devices in a NUMA-based system is an important factor when tuning for performance” - „AMD EPYC 9005 Processor Architecture Overview” page 7
What was the RAM configuration? 12 DIMM modules (optimal) or 24 (suboptimal)?
Was the virtualization involved? If so, how was it configured? How does bare metal performance compare to virtualized system for this specific code?
So many opportunities to explore not mentioned in the text.
Epyc’s naming is beautiful and consistent
This post sponsored by AMD®.
Someday someone will deploy CXL
Reminds that time when cheap Celeron with small cache was beating expensive Pentium with large cache (if i remember correctly that Celeron's cache was running at the core frequency while Pentium's was a separate die on half-frequency, and Celeron was very overclockable)
Lower cache per core is actually a pretty natural outcome with the latest device fabrication nodes shrinking logic while leaving the size of SRAM largely unchanged. We may perhaps also see eDRAM (a lot denser than SRAM) for last-level caches.
Pentium 4 the first release. AMD had the same gimmick with their Phenoms.
> trading cache for cores
Viva el Celeron
> for 2x performance
You wish.
Are people at Coudfare so young that they didn't heard about Celeron and Duron ?
The tradeoff: The opportunity: Proving it out:
Nah I'm good thanks. Slop takes more effort to read and just raises questions of accuracy. It's just disrespectful to your reader to put that work on them. And in a marketing blog post it's just a bad idea.
Cloudflare has excellent (human) technical writers. I don’t see any indication this is “slop”, it’s the standard in-the-weeds but understandable blog post they’ve been doing for years.
AI text is everywhere, but this isn’t it.
Cloudflare definitely does have excellent technical writers, but a) this doesn't seem to be (entirely? substantially?) from them, and b) if there are AI tropes clearly visible, which they are to me, it's putting readers off regardless of whether the content is AI generated, and that's just bad marketing.
Cloudflare had excellent human technical writers. But for the past months/years they got slowly replaced by AI, and the quality of the posts dove down drastically.
Remember when they had "implemented a serverless post-quantum Matrix server", where they blatantly lied saying it's production ready, when most of the encryption features were not even implemented. (Then rushed to removed the LLM's 'todo' tags from the code). https://tech.lgbt/@JadedBlueEyes/115967791152135761
This is AI, but I can’t prove it, lol. :) The bulleted lists that are too short both in total list length and text per list item length. littles drama headers as parent noted.
To your point, this would register as “human bloviating for word count subtly” if llms didn’t exist, and at this point is probably the most useful feedback. I doubt it’s 100% one-shot AI, but someone definitely optimized it in parts, but the AI heard “concise” as “bullets and short sentences.”
Agree to disagree. It is likely ai enhanced some where along the path to production. So many phrases reek AI but others do not. Is this a sprinkling of llm help or how a human genuinely writes, idk.
Out of curiosity, can you point to specific sections that reel of AI? I read the article and didn't see anything that immediately stuck out, but maybe I need to start looking for different signals.
This is LLM tropey:
> For our FL1 request handling layer, NGINX- and LuaJIT-based code, this cache reduction presented a significant challenge. But we didn't just assume it would be a problem; we measured it.
It’s not that the “it’s not just A, it’s B” pattern. It’s just that humans don’t write like that. You don’t go “I didn’t just assume; I measured”. People usually say something like “but we didn’t know for sure, so we decided to measure it”. The LLM text is over-confident.
Here’s an example on HN https://news.ycombinator.com/item?id=47538047
> The choice to never invert raster images isn't a compromise, it's the design decision. The problem veil solves is exactly that: every dark mode reader today inverts everything, and the result on photos, histology, color charts, scans is unusable. Preserving all images is the conservative choice, and for my target (people reading scientific papers, medical reports, technical manuals) it's the right one.
It’s like a guy putting together a promo packet or something. A normal person would be a little hesitant and wouldn’t just go. “And what I’m doing isn’t because of constraints. It’s because I am making the right choices!”
It’s just an oddly stilted way of speaking in conversation. Imagine talking to someone like that in real life. It would be all like “And then I thought the problem was that the global variable was set wrong. But I didn’t just assume that, I verified it.”
No one’s accusing you of assuming it, dude. You don’t have to pre-emptively tell us you didn’t just assume it. Normal people don’t say that.
I don’t have much of a problem with LLM text because I just skip over flavor like this to charts, code, and tables but this is obviously LLM
Ah appreciate it. A year ago it was very clear when something was written by an LLM, but now you've gotta look for certain characteristics. I try not to infer to much, especially because llms are really helpful for non native English speakers to write faster.
I'd like to make it a bit more normalized to have public writing be transparent about if llms were used and how. That makes it quite a bit easier for readers to focus on the content instead of debating how something was written lol
“ deliver more than just a core count increase. The architecture delivers improvements across multiple dimensions”
“But we didn't just assume it would be a problem; we measured it.”
“ Instead of compromising, we built FL2. ”
Idk if i am now seeing this pattern everywhere because it is all AI slop or if people really do write this way.
Skimming it, this looks like they got a partnership with amd and tacked it onto an ongoing project as if it were planned. This confuses us as it makes it harder to understand how much was the rewrite generally or was it some hardware thing? Man, i used to really enjoy cloudflares technical blogs.
[flagged]