Comparing AWS Lambda ARM64 vs. x86_64 Performance Across Runtimes in Late 2025

I would not be surprised that Rust be faster than Python, but looking at the code of his Benchmarks, I'm not sure that it really means anything.

For example, the "light" test will do calls to "dynamodb". Maybe you benchmark python, or you benchmark the aws sdk implementation with different versions of python, or just you benchmark average dynamodb latencies at different times of the day, ...

And, at least for Python, his cpu and memory test code looks like to be especially specific. For example, if you use "sha256" or any other hashlib, you will not really benchmark python but the C implementation of the hashlib, that might depend on the crypto or gcc libraries or optimization that was used by AWS to build the python binaries.

I think the 'light' workload is the most realistic - most people use lambda as a stateless CRUD backend/REST endpoint, I don't think doing heavy number crunching is that idiomatic inside a lambda.

And for that use case, Python seems almost as good as Rust, which is surprising to me, as is the fact that Node runs so slow - I have a ton of Node-based lambdas in prod, and while they're no speed demons, I'm quite surprised how bad it is compared to even interpreted things like Python.

AWS should really look into this and fix it, or at least tell us why it's so slow.

[deleted]

Sure. Who is to say where the bottleneck is, but if an application is going to use all those same libraries and runtimes, it’s not an unrealistic test. Obviously, with all benchmarking, the most accurate benchmark is your own application, but this seems pretty reasonable as a generic first cut.

Yeah, having a native binary be only 2x faster than CPython, or CPython showing 4x (!!) advantage vs. V8 seems really suspicious. This benchmark suite is probably good for looking at CPU architecture deltas on identical source code (which seems to be what it was good for), but trying to intuit language/runtime behavior from it seems very dangerous.

This is very thorough, but I have a few things to add if you are using Node 22. I benchmarked Node 22 earlier this year using something similar to the light benchmark. https://speedrun.nobackspacecrew.com/blog/2025/07/21/the-fas...

I found that Node 22 had ~50ms slower coldstarts than Node 20. This is because the AWS Javascript V3 SDK loads the http request library which became much heavier in Node 22. This happens on the newly released Node 24 as well.

I recommend that if you are trying to benchmark coldstarts on Lambda, you measure latency from a client as well. The Init Duration in the logs doesn't include things like decrypting environment variables which adds ~20ms and other overhead like pulling the function code from S3. The impact of this manifests when comparing runtimes like llrt to Node, the Init Duration is faster than Node, but the E2E time from the client is actually closer because the llrt bundle size is 4-5MB larger than Node.

I'd be much more interested in bare metal ARM64 vs X64 workloads on modern hardware, remember the article that compared Hetzner VPN + Bare metal to Amazon workloads a while back?

Amazon was so ridiculously gimped and expensive that it was almost unfair, thus comparing ARM and X64 on Amazon thus runs into whatever arbitrary "savings" AWS does.

What I don't get is why is it that Node is dog slow. Seriously it seems borderline unusable.

In terms of perf, Node has a pretty snappy JIT, and seems to perform OK in the browser, but this seems like something's not right here.

~200ms requests even in the light case are on the very brink of barely acceptable.

On the other hand, Python performs very well, but it's alarming to see it gets slower in every release by a significant margin.

I have used nodejs quite a bit, and I would say it generally performs quite well. Its not as good at making the most out of monster hardware though.

> What I don't get is why is it that Node is dog slow. Seriously it seems borderline unusable.

This is in line with my experience using anything written in Node.js

Not on a dedicated server - if serving a db query in a rest endpoint took 100ms in Node, it wouldn't have gotten popular,

In my experience, Node perf is 'okay' - not stellar but a simple express/js handler certainly doesn't take 100ms. This sounds 10x-100x slower than running something similar on a dedicated instance.

Would be interesting to add a cold start + "import boto3" benchmark for Python as importing boto3 takes forever on lambdas with little memory. For this scenario I only know this benchmark but it is from 2021 https://github.com/MauriceBrg/aws-blog.de-projects/tree/mast...

I don't really use Python, but most AWS SDKs seem to be autogenerated for each language, and they're pretty much just thin wrappers over REST calls to interal AWS endpoints.

I dunno why a Python impl would be particularly heavy.

if imports are slow one should probably look into pre-compiling .pyc files into the Lambda bundle

This is a well known issue, and the fix is not to create any boto3 clients at runtime. Instead, ensure they're created globally (even if you throw them away) as the work then gets done once during the init period. The init period gets additional CPU allocation, so this is essentially "free" CPU.

Source: I'm a former AWS employee.

Thanks for citing your sources, I think your source may be out if date, though! The “free init time hack” was killed in August (unless I’m missing something - never used it myself).

https://aws.amazon.com/blogs/compute/aws-lambda-standardizes...

Good callout that it's no longer free. However, you still get extra CPU, and assuming your execution environment isn't reloaded, that init time is amortized across all the invocations for the execution environment.

SnapStart is more widely available, which is the other option for shrinking the billed time spent in init (when I left, only Java SnapStart was available)

It’s interesting that the author chose to use SHA256 hashing for the CPU intensive workload. Given they run on hardware acceleration using AES NI, I wonder how generally applicable it is. Still interesting either way though, especially since there were reports of earlier Graviton (pre v3) instances having mediocre AES NI performance.

Hardware-accelerated SHA support has a patchy history. I wrote an article some years ago about the prevalence of SHA instructions in x86 in x86_64 CPUs [0], like the current mess we see now with AVX-512, Intel invented something useful then declined to continue supporting it, while competitors that were late to the party became the real champions.

[0]: https://neosmart.net/blog/will-amds-ryzen-finally-bring-sha-...

Does AES NI imply SHA256 acceleration support?

There are some crossed wires here.

AES-NI is x86-specific terminology. It was proposed in 2008. SHA acceleration came later, announced in 2013. The original version covers only SHA-1 and SHA-256 acceleration, but a later extension adds SHA-512 acceleration. At least for x86, AES-NI does not imply SHA support. For example, Westmere, Sandy Bridge, and Ivy Bridge chips from Intel have AES-NI but not SHA.

The equivalent in Arm land is called "Cryptographic Extensions" and was a non-mandatory part of ARMv8 announced in 2011. Both AES and SHA acceleration were announced at the same time. While part of the same extensions, there are separate feature flags for each of AES, SHA-1, and SHA-256.

I saw so many red flags:

- How is Rust only one order of magnitude faster than Python?

- How is Python that much faster than Node.js?

So I looked at the benchmark repo.

These benchmarks mean nothing folks!

Each of these benchmarks is just a SHA256 hash. This is NOT a valid way to compare CPUs, except if the only thing you will ever do with the CPU is to execute SHA256 hashes.

Hash functions are not representative of the performance of:

- Compression or decompression (of text, video, or anything really)

- Parsing

- Business logic (which is often dominated by pointer chasing)

So, you can safely ignore the claims of this post. They mean nothing.

I tried to do a very low latency https endpoint with Lambda and Rust and wasn't able to get less than 30ms, no matter what I tried.

Then I deoloyed an ECS task with ALB and got something like <5ms.

Has anybody gotten sub-10ms latencies with Lambda Https functions?

Would be interesting to see a benchmark with the rust binary with successively more “bloat” in the binary to separate out how much of the cold start is app start time vs app transfer time. It would also be useful to have the c++ lambda runtime there too; I expect it probably performs similarly to rust.

Tangent: when you have a lambda returning binary data, it’s pretty painful to have to encode it as base64 just so it can be serialized to json for the runtime. To add insult to injury, the base64 encoding is much more likely put you over the response size limits (6MB normally, 1MB via ALB). The c++ lambda runtime (and maybe rust?) lets you return non-JSON and do whatever you want, as it’s just POSTing to an endpoint within the lambda. So you can return a binary payload and just have your client handle the blob.

Yikes, Node.js did really badly. If this holds up, my take-away would be ...

If I want to use TypeScript for Functions, I should write to the v8 runtimes (Deno, Cloudflare, Supabase, etc) which are much faster due to being much more lightweight.

One of the easiest hack to reduce your AWS bills is to migrate from x86 to arm64 CPU. Performance difference is negligible, and cost can be upto 50% lower for arm machines. This is for both RDS and general compute (EC2, ECS). Would recommend to all.

I'd say the best price/performance hack on AWS if you don't need web scale is just put your stuff on a tiny EC2 instance, like a t3.micro - it'll be likely faster and more flexible than lambda with much more predictable performance.

You can scale up by changing out to a bigger instance - it's surprising how far you can get with this strategy.

How is the performance difference negligible? In my experience, for the same generation of hardware, ARM64 performance is better than the AMD64 one.

AFAIK ARM64 is around 20% cheaper, not sure where you got the 50%.

In different regions the price difference is different. In us-east there is a 20% difference, in ap-south it is 50%. You can check for fargate ecs pricing for example.

Something not mentioned in this article is that which x86_64 and arm64 implementation you get are both relatively frozen in time. I haven't checked recently, but the last time I did, the arm64 implementation was stuck at something like Graviton2.

This is benchmarking `hashlib.sha256` isn't that normally OpenSSL's heavily hand optimized assembly implementation, certainly isn't something written in Python?

I think that's the CPU benchmark rather than the python benchmark -- and comparing CPU ARM64 vs x86_64 seems worthwhile.

It is the CPU-Intensive Workload Results; which compares Python versions and notes "Python 3.11 consistently outperformed newer versions across all memory configurations. It was 9-15% faster than Python 3.12, 3.13, and 3.14. This surprised me" The most obvious conclusion is the benchmark is simply flawed in some way; if this result is real, then it says something about how AWS compiled OpenSSL it says absolutely nothing about the speed of Python versions.

[deleted]

Can someone tell me why there isn't almost any laptop with Linux and ARM? Is it more efficient than x86 though

It is a pain to make any new platform useful enough for large adoption. Apple made a lot of effort to get MacBook M1 useable, same for AWS with Graviton. Eventually it will be adopted for Linux laptops too, even without a specific vendor focusing on it, but it will take time.

How is the bootloader/peripheral compatibility on the non-SBC ARM systems these days? Can you plug in a boot disk on different machine and expect it to just work? My main problem with ARM is that many manufacturers act as if they're special little snowflakes and deserve to have their custom patched kernel/bootloader/whatever.

This is the goal of the Arm SystemReady compliance label. The selection is still pretty limited and what's out there is generally buggy, but there's a few boards out there you can buy like the Orion O6 [0]. If you just want a stable system with predictable performance, you're probably better off with a more traditional system though.

[0] https://radxa.com/products/orion/o6/

Afaik a lot of bootloaders are proprietary/wonky, a lot of SOCs run custom bootloaders.

However if you do manage to boot things up, hardware with open-source drivers should just work, for example Jeff Geerling has couple of videos on youtube about running his RPi with external AMD graphics cards connected via PCIe, and it works.

Software/driver compatibility and rational fear of change from users.

(My work laptop is one of the few ARM laptops: Thinkpad T14s with Quallcomm Snapdragon Elite)

If you don’t mind me asking, what do you think of that laptop? What kind of workloads do you run and how is battery life? What OS? Would you choose it again?

Was trying to install Linux on it, though its not working like a standard x86 laptop (for the installer on debian for example).

Battery is good, hardware is really rock solid (though I dislike the new plastic for the keyboard).

Really can’t complain, it’s nearly as good as my Macbook.

It runs Windows 11 today, and everything I need runs fine (jetbrains, rustc, clang, msvc, terraform and of course python).

I’m a technical CTO with infrastructure background, most of my time is spent in spreadsheets these days unfortunately.

Chromebooks are essentially this, but not that great for local development.

So then one solution might be to buy a Chromebook, and put regular Linux on it? I don’t think the Chromebook are locked down.

Depends on which one, and what you want to locally develop.

Is there one that even has a full keyboard?

HP makes a 17" Chromebook if that's what you're after.

> Node.js: Node.js 22 on arm64 was consistently faster than Node.js 20 on x86_64. There's essentially a “free” ~15-20% speedup just by switching architectures!

Not sure why this is phrased like that in the TL;DR, when ARM64 is just strictly faster when running the same nodejs workload and version.

Intel execs after reading this: FAST, more stock buybacks and executive bonuses to mitigate this!!!