I was surprised to see that Java was slower than C++, but the Java code is run with `-XX:+UseSerialGC`, which is the slowest GC, meant to be used only on very small systems, and to optimise for memory footprint more than performance. Also, there's no heap size, which means it's hard to know what exactly is being measured. Java allows trading off CPU for RAM and vice-versa. It would be meaningful if an appropriate GC were used (Parallel, for this batch job) and with different heap sizes. If the rules say the program should take less than 8GB of RAM, then it's best to configure the heap to 8GB (or a little lower). Also, System.gc() shouldn't be invoked.
Don't know if that would make a difference, but that's how I'd run it, because in Java, the heap/GC configuration is an important part of the program and how it's actually executed.
Of course, the most recent JDK version should be used (I guess the most recent compiler version for all languages).
It’s so hard to actually benchmark languages because it so much depends on the dataset, I am pretty sure with simdjson and some tricks I could write C++ (or Rust) that could top the leaderboard (see some of the techniques from the billion row challenge!).
tbh for silly benchmarks like this it will ultimately be hard to beat a language that compiles to machine code, due to jit warmup etc.
It’s hard to due benchmarks right, for example are you testing IO performance? are OS caches flushed between language runs? What kind of disk is used etc? Performance does not exist in a vacuum of just the language or algorithm.
> due to jit warmup
I think this harness actually uses JMH, which measures after warmup.
Why are you surprised? Java always suffers from abstraction penalty for running on a VM. You should be surprised (and skeptical) if Java ever beats C++ on any benchmark.
The only "abstraction penalty" of "running on a VM" (by which I think you mean using a JIT compiler), is the warmup time of waiting for the JIT.
in my opinion, this assertion suffers from the "sufficiently smart compiler" fallacy somewhat.
The linked article makes a specific carveout for Java, on the grounds that its SufficientlySmartCompiler is real, not hypothetical.
c++ certainly also has and needs a similarly sufficiently smart compiler to be compiled at all…
Its a statement of our times that this is getting down voted. JIT is so underrated.
For the most naive code, if you're calling "new" multiple times per row, maybe Java benefits from out of band GC while C++ calls destructors and free() inline as things go out of scope?
Of course, if you're optimizing, you'll reuse buffers and objects in either language.
In the end, even Java code becomes machine code at some point (at least the hot paths).
yes, but that's just one part of the equation. machine code from compiler and/or language A is not necessarily the same as the machine code from compiler and/or language B. the reasons are, among others, contextual information, handling of undefined behavior and memory access issues.
you can compile many weakly typed high level languages to machine code and their performance will still suck.
java's language design simply prohibits some optimizations that are possible in other languages (and also enables some that aren't in others).
I was very surprised to see the results for common lisp. As I scrolled down I just figured that the language was not included until I saw it down there. I would have guessed SBCL to be much faster. I checked it out locally and got: Rust 9ms, D: 16ms, and CL: 80ms.
Looking at the implementation, only adding type annotations, there was a ~10% improvement. Then the tag-map using vectors as values which is more appropriate than lists (imo) gave a 40% improvement over the initial version. By additionally cutting a few allocations, the total time is halved. I'm guessing other languages will have similar easy improvements.
D gets no respect. It's a solid language with a lot of great features and conveniences compared to C++ but it barely gets a passing mention (if that) when language discussions pop up. I'd argue a lot of the problems people have with C++ are addressed with D but they have no idea.
Ecosystem isn't that great, and much of it relies on the GC. If you're going to move out of C++, you might as well go all in on a GC language (Java, C#, Go) or use Rust. D's value proposition isn't enough to compete with those languages.
Garbage collection has never been a major issue for most use cases. However, the Phobos vs. Tango and D1 vs. D2 splits severely slowed D’s adoption, causing it to miss the golden window before C++11, Go, and Rust emerged.
D has a GC and it’s optional. Which should be the best of both worlds in theory.
Also D is older than Go and Rust and only a few months younger than C#. So the question then becomes “why weren’t people using D when your recommended alternatives weren’t an option?” Or “why use the alternatives (when they were new) when D already exists?”
> D has a GC and it’s optional.
This is only true in the most technical sense: you can easily opt-out of the GC, but you will struggle with the standard library, and probably most third-party libraries too. It's the baseline assumption after all, hence why it's opt-out, not opt-in. There was a DConf talk about the future of Phobos which indicated increased support for @nogc, but this is a ways away, and even then. If you're opting-out of the GC, you are giving up a lot. And honestly, if you really don't want the GC, you may be better off with Zig.
Could say the same for Nim.
But popularity/awareness/ecosystem matter.
That's the great thing about LLMs.
Especially with Nim it's so easy to make quality libraries with a Codex/ClaudeCode and a couple hours as a hobby.
Especially when they run fast. I just made Metal bindings and got 120 FPS demos with SDF bitmaps running yesterday while eating Saturday brunch.
[deleted]
If the difference in performance between the target language and C++ is huge, it's probably not the language that's great, but some quirk of implementation.
C# is very fast (see multicore rating). Implementation based on simd (vector), memory spans, stackalloc, source generators and what have you — modern C# allows you go very low-level and very fast.
Probably even faster under .net 10.
Though using stopwatch for benchmark is killing me :-) Wonder if multiple runs via benchmarkdotnet would show better times (also due to jit optimizations). For example, Java code had more warm-up iterations before measuring
This entire benchmark is frankly a joke. As other commenters have pointed out, the compiler flags make no sense, they use pretty egregious ways to measure performance, and ancient versions are being used across the board. Worst of all, the code quality in each sample is extremely variable and some are _really_ bad.
I mean this is only meant to be an iteration if I understand correctly. Its not like someone is going around citing this benchmark yelling rewrite everything in Julia / D. Imo this is a good starting point if you are doubtful or fall into the trap of Java is not fast. For most workloads we can clearly see, Java trades off the control of C++ for "about the same speed" and much much larger and well managed ecosystem. (Except for the other day, when someones OpenJDK PR was left hanging for a month which I am not sure why).
If you get the same speeds for C++ and Java, I'd like to point out that the C++ implementation is likely very sub-optimal.
This can obviously be true for toy problems, but tends not to generalize.
Quality does vary wildly because the languages vary wildly in terms of language constructs and standard libraries. Proficiency in every.single.language. used in the benchmark perhaps should not be taken for granted.
But it is an GitHub repository and the repository owner appears to accept PR's and allows people to raise an issue to provide their feedback, or… it can be forked and improved upon. Feel free to jump in and contribute to make it a better benchmark that will not be «frankly a joke» or «_really_ bad».
I'm completely alright with just having fun and hosting your own little sandboxes online, but what good does it do to post and share this with others in its current state? The picture it paints is certainly not representative, and this sort of thing has been done a million times over with much better consistency. Again, I think it's great to hack around in every language and document your journey all the way, but sharing this is borderline misinformation. It's certainly not my duty to right the wrongs of this benchmark.
The fact that Julia “highly optimized” is 30x faster than the normal Julia implementation, yet still fails to reach for some pretty obvious optimizations, and uses a joke package called “SuperDataStructures” tells me that maybe this benchmark shouldn’t be taken all that seriously.
Benchmarks like this can still be fun and informative
[deleted]
Why is there no C benchmark? The C++ benchmark appears to be "modern C++" which isn't a substitute.
[deleted]
Go being beaten by C# in multicore is quite hard to believe. Also Zig and Odin doing so "poorly" in single core is strange.
The quality of the benchmark code is... not great. This seems like Zig written by someone who doesn't know Zig or asked Claude to write it for them. Hell, actually Claude might do a better job here.
In short, I wouldn't trust these results for anything concrete. If you're evaluating which language is a better fit for your problem, craft your own benchmark tailored for that problem instead.
(Given credits to both sources in the description of this repo)
(Also fair disclosure but it was generated just out of curiosity of how this benchmark data might look if it was on benjdd's ui and I used LLM's for this use case for prototyping purposes. The result looks pretty simiar imo for visualization so full credits to benjdd's awesome visualization, I just wanted this to be in that to see for myself but ended up having it open source/on github pages)
I think benjdd's on hackernews too so hi ben! Your websites really cool!
Someone replied to me in an old comment that for fast Python you have to use numpy. In the folder there is a program in plain python, another with numpy and another with numba. I'm not sure why only one is shown in the data.
Disclaimer: I used numpy and numba, but my level is quite low. Almost as if I just type `import numpy as np` and hope the best.
For what it's worth, I've ported a lot of heavily optimized numpy code to Julia for work, and consistently gotten 10x-100x speedups, largely due to how much easier it is to control memory allocations and parallelize more effectively.
> Almost as if I just type `import numpy as np` and hope the best.
As do we all. If you browse through deep learning code a large majority is tensor juggling.
Isn't that measuring the speed of json encoding instead?
I see some questions around the methodology of the testing. But is this representative of Ruby? Several minutes total when most finish under a second?
Generally Ruby ought to be a tad faster than Python [0] (neither the best choice when run-time performance is of high importance). For the latter chances are however better to find a package implementing the performance critical sections in a more suitable language. See e.g. the pidigts example [1] where the "Python" program performs quite competitively, as the heavy lifting is done there by the GNU MP library (itself implemented in C and ASM).
What's up with the massive jump for 20k to 60k for nearly all languages?
My guess would be cache related. 5k probably fits in L1-L2 cache, whereas 20k might put you into L3.
[deleted]
That’s odd zig concurrent got slower
Contention overhead likely. Performance is more than just the langauge.
Also 3 years old. Zig has been rewritten in that time
I wrote a script (now an app basically haha) to migrate data from EMR #1 to EMR #2 and I chose Nim because it feels like Python but it's fast as hell. Claude Code did a fine job understanding and writing Nim especially when I gave it more explicit instructions in the system prompt.
Genuine question: Are GitHub workflows stable enough to be used for benchmarking? Like CPU time quantum scheduling is guaranteed to be the same from run to run?
No, it’s sloppy benchmarking
Data processing benchmark but somehow R is not even mentioned?
It would be the slowest language result on the list.
Slower than Python? I seriously doubt that
Port the script to R, benchmark and report your results. Python is slow, but R is generally much slower.
So in the D vs Zig vs Rust vs C fight - learn d if speed is your thing?
That only applies in an apples-to-apples comparison, i.e., same data structures, same algorithm, etc.
You can't compare sorting in C and Python, but use bubble sort in C and radix sort in Python.
In here there are different data structures being used.
> D[HO] and Julia [HO] footnote: Uses specialized datastructures meant for demonstration purposes: more ↩ ↩2
You're right of course but it also depends on how long you want to spend on it. If Python gives you radix sort directly and the C implementation you can have with the same time is bubble sort because you spent much time setting up the project and finding the right libs it kinda makes sense.
I was surprised to see that Java was slower than C++, but the Java code is run with `-XX:+UseSerialGC`, which is the slowest GC, meant to be used only on very small systems, and to optimise for memory footprint more than performance. Also, there's no heap size, which means it's hard to know what exactly is being measured. Java allows trading off CPU for RAM and vice-versa. It would be meaningful if an appropriate GC were used (Parallel, for this batch job) and with different heap sizes. If the rules say the program should take less than 8GB of RAM, then it's best to configure the heap to 8GB (or a little lower). Also, System.gc() shouldn't be invoked.
Don't know if that would make a difference, but that's how I'd run it, because in Java, the heap/GC configuration is an important part of the program and how it's actually executed.
Of course, the most recent JDK version should be used (I guess the most recent compiler version for all languages).
It’s so hard to actually benchmark languages because it so much depends on the dataset, I am pretty sure with simdjson and some tricks I could write C++ (or Rust) that could top the leaderboard (see some of the techniques from the billion row challenge!).
tbh for silly benchmarks like this it will ultimately be hard to beat a language that compiles to machine code, due to jit warmup etc.
It’s hard to due benchmarks right, for example are you testing IO performance? are OS caches flushed between language runs? What kind of disk is used etc? Performance does not exist in a vacuum of just the language or algorithm.
> due to jit warmup
I think this harness actually uses JMH, which measures after warmup.
Why are you surprised? Java always suffers from abstraction penalty for running on a VM. You should be surprised (and skeptical) if Java ever beats C++ on any benchmark.
The only "abstraction penalty" of "running on a VM" (by which I think you mean using a JIT compiler), is the warmup time of waiting for the JIT.
in my opinion, this assertion suffers from the "sufficiently smart compiler" fallacy somewhat.
https://wiki.c2.com/?SufficientlySmartCompiler
The linked article makes a specific carveout for Java, on the grounds that its SufficientlySmartCompiler is real, not hypothetical.
c++ certainly also has and needs a similarly sufficiently smart compiler to be compiled at all…
Its a statement of our times that this is getting down voted. JIT is so underrated.
For the most naive code, if you're calling "new" multiple times per row, maybe Java benefits from out of band GC while C++ calls destructors and free() inline as things go out of scope?
Of course, if you're optimizing, you'll reuse buffers and objects in either language.
In the end, even Java code becomes machine code at some point (at least the hot paths).
yes, but that's just one part of the equation. machine code from compiler and/or language A is not necessarily the same as the machine code from compiler and/or language B. the reasons are, among others, contextual information, handling of undefined behavior and memory access issues.
you can compile many weakly typed high level languages to machine code and their performance will still suck.
java's language design simply prohibits some optimizations that are possible in other languages (and also enables some that aren't in others).
I was very surprised to see the results for common lisp. As I scrolled down I just figured that the language was not included until I saw it down there. I would have guessed SBCL to be much faster. I checked it out locally and got: Rust 9ms, D: 16ms, and CL: 80ms.
Looking at the implementation, only adding type annotations, there was a ~10% improvement. Then the tag-map using vectors as values which is more appropriate than lists (imo) gave a 40% improvement over the initial version. By additionally cutting a few allocations, the total time is halved. I'm guessing other languages will have similar easy improvements.
D gets no respect. It's a solid language with a lot of great features and conveniences compared to C++ but it barely gets a passing mention (if that) when language discussions pop up. I'd argue a lot of the problems people have with C++ are addressed with D but they have no idea.
Ecosystem isn't that great, and much of it relies on the GC. If you're going to move out of C++, you might as well go all in on a GC language (Java, C#, Go) or use Rust. D's value proposition isn't enough to compete with those languages.
Garbage collection has never been a major issue for most use cases. However, the Phobos vs. Tango and D1 vs. D2 splits severely slowed D’s adoption, causing it to miss the golden window before C++11, Go, and Rust emerged.
D has a GC and it’s optional. Which should be the best of both worlds in theory.
Also D is older than Go and Rust and only a few months younger than C#. So the question then becomes “why weren’t people using D when your recommended alternatives weren’t an option?” Or “why use the alternatives (when they were new) when D already exists?”
> D has a GC and it’s optional.
This is only true in the most technical sense: you can easily opt-out of the GC, but you will struggle with the standard library, and probably most third-party libraries too. It's the baseline assumption after all, hence why it's opt-out, not opt-in. There was a DConf talk about the future of Phobos which indicated increased support for @nogc, but this is a ways away, and even then. If you're opting-out of the GC, you are giving up a lot. And honestly, if you really don't want the GC, you may be better off with Zig.
Could say the same for Nim.
But popularity/awareness/ecosystem matter.
That's the great thing about LLMs.
Especially with Nim it's so easy to make quality libraries with a Codex/ClaudeCode and a couple hours as a hobby.
Especially when they run fast. I just made Metal bindings and got 120 FPS demos with SDF bitmaps running yesterday while eating Saturday brunch.
If the difference in performance between the target language and C++ is huge, it's probably not the language that's great, but some quirk of implementation.
For comparison here's one from Dec '25
https://niklas-heer.github.io/speed-comparison
Certainly does "look" very interesting.
C# is very fast (see multicore rating). Implementation based on simd (vector), memory spans, stackalloc, source generators and what have you — modern C# allows you go very low-level and very fast.
Probably even faster under .net 10.
Though using stopwatch for benchmark is killing me :-) Wonder if multiple runs via benchmarkdotnet would show better times (also due to jit optimizations). For example, Java code had more warm-up iterations before measuring
This entire benchmark is frankly a joke. As other commenters have pointed out, the compiler flags make no sense, they use pretty egregious ways to measure performance, and ancient versions are being used across the board. Worst of all, the code quality in each sample is extremely variable and some are _really_ bad.
I mean this is only meant to be an iteration if I understand correctly. Its not like someone is going around citing this benchmark yelling rewrite everything in Julia / D. Imo this is a good starting point if you are doubtful or fall into the trap of Java is not fast. For most workloads we can clearly see, Java trades off the control of C++ for "about the same speed" and much much larger and well managed ecosystem. (Except for the other day, when someones OpenJDK PR was left hanging for a month which I am not sure why).
If you get the same speeds for C++ and Java, I'd like to point out that the C++ implementation is likely very sub-optimal.
This can obviously be true for toy problems, but tends not to generalize.
Quality does vary wildly because the languages vary wildly in terms of language constructs and standard libraries. Proficiency in every.single.language. used in the benchmark perhaps should not be taken for granted.
But it is an GitHub repository and the repository owner appears to accept PR's and allows people to raise an issue to provide their feedback, or… it can be forked and improved upon. Feel free to jump in and contribute to make it a better benchmark that will not be «frankly a joke» or «_really_ bad».
I'm completely alright with just having fun and hosting your own little sandboxes online, but what good does it do to post and share this with others in its current state? The picture it paints is certainly not representative, and this sort of thing has been done a million times over with much better consistency. Again, I think it's great to hack around in every language and document your journey all the way, but sharing this is borderline misinformation. It's certainly not my duty to right the wrongs of this benchmark.
The fact that Julia “highly optimized” is 30x faster than the normal Julia implementation, yet still fails to reach for some pretty obvious optimizations, and uses a joke package called “SuperDataStructures” tells me that maybe this benchmark shouldn’t be taken all that seriously.
Benchmarks like this can still be fun and informative
Why is there no C benchmark? The C++ benchmark appears to be "modern C++" which isn't a substitute.
Go being beaten by C# in multicore is quite hard to believe. Also Zig and Odin doing so "poorly" in single core is strange.
The quality of the benchmark code is... not great. This seems like Zig written by someone who doesn't know Zig or asked Claude to write it for them. Hell, actually Claude might do a better job here.
In short, I wouldn't trust these results for anything concrete. If you're evaluating which language is a better fit for your problem, craft your own benchmark tailored for that problem instead.
So far, the best benchmark seems to be the https://plummerssoftwarellc.github.io/PrimeView/
Although it is very single-thread biased test.
This is really interesting. Julia is a beast compared to python.
Nowadays whenever I see benchmarks of different languages. I really compare it to benjdd.com/languages or benjdd.com/languages2
Ended up creating a visualization of this data if anybody's interested
https://serjaimelannister.github.io/data-processing-benchmar...
(Given credits to both sources in the description of this repo)
(Also fair disclosure but it was generated just out of curiosity of how this benchmark data might look if it was on benjdd's ui and I used LLM's for this use case for prototyping purposes. The result looks pretty simiar imo for visualization so full credits to benjdd's awesome visualization, I just wanted this to be in that to see for myself but ended up having it open source/on github pages)
I think benjdd's on hackernews too so hi ben! Your websites really cool!
Someone replied to me in an old comment that for fast Python you have to use numpy. In the folder there is a program in plain python, another with numpy and another with numba. I'm not sure why only one is shown in the data.
Disclaimer: I used numpy and numba, but my level is quite low. Almost as if I just type `import numpy as np` and hope the best.
For what it's worth, I've ported a lot of heavily optimized numpy code to Julia for work, and consistently gotten 10x-100x speedups, largely due to how much easier it is to control memory allocations and parallelize more effectively.
> Almost as if I just type `import numpy as np` and hope the best.
As do we all. If you browse through deep learning code a large majority is tensor juggling.
Isn't that measuring the speed of json encoding instead?
I see some questions around the methodology of the testing. But is this representative of Ruby? Several minutes total when most finish under a second?
Generally Ruby ought to be a tad faster than Python [0] (neither the best choice when run-time performance is of high importance). For the latter chances are however better to find a package implementing the performance critical sections in a more suitable language. See e.g. the pidigts example [1] where the "Python" program performs quite competitively, as the heavy lifting is done there by the GNU MP library (itself implemented in C and ASM).
[0] https://benchmarksgame-team.pages.debian.net/benchmarksgame/... [1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
What's up with the massive jump for 20k to 60k for nearly all languages?
My guess would be cache related. 5k probably fits in L1-L2 cache, whereas 20k might put you into L3.
That’s odd zig concurrent got slower
Contention overhead likely. Performance is more than just the langauge.
Also 3 years old. Zig has been rewritten in that time
I wrote a script (now an app basically haha) to migrate data from EMR #1 to EMR #2 and I chose Nim because it feels like Python but it's fast as hell. Claude Code did a fine job understanding and writing Nim especially when I gave it more explicit instructions in the system prompt.
Genuine question: Are GitHub workflows stable enough to be used for benchmarking? Like CPU time quantum scheduling is guaranteed to be the same from run to run?
No, it’s sloppy benchmarking
Data processing benchmark but somehow R is not even mentioned?
It would be the slowest language result on the list.
Slower than Python? I seriously doubt that
Port the script to R, benchmark and report your results. Python is slow, but R is generally much slower.
So in the D vs Zig vs Rust vs C fight - learn d if speed is your thing?
That only applies in an apples-to-apples comparison, i.e., same data structures, same algorithm, etc. You can't compare sorting in C and Python, but use bubble sort in C and radix sort in Python.
In here there are different data structures being used.
> D[HO] and Julia [HO] footnote: Uses specialized datastructures meant for demonstration purposes: more ↩ ↩2
You're right of course but it also depends on how long you want to spend on it. If Python gives you radix sort directly and the C implementation you can have with the same time is bubble sort because you spent much time setting up the project and finding the right libs it kinda makes sense.