For a such an interface to be feasible to support in common open source infrastructure it needs a pure software implementation for testing and development purposes. Even better something along the lines of coz to even model performance by throttling down everything else proportionally.
It’s really frustrating that the HotOS paper itself has no details about the benchmarking, and the blog post just says “redis benchmark”. What was the system setup? Persistence options? What was ported to demikernel? The client writing, the server reading from the NIC? Based on the problem specified in the paper, I assume its reading from the NIC that was implemented in DemiOS
7-10us for what is a hashtable set/get is really, really bad
I can get a packet out to a switch and back to another machine and in 1-2us
Do you mean 1-2ms?
No, 1-2us is correct for that — in a datacenter, with cut-through switching.
That's really impressive. I need to update myself on this topic. Thanks.
In reality - with decent switches at 25g - and no fec - node to node is reliably under 300ns (0.3 us)
Considering that 300 light-nanoseconds is about 90m, getting a response (or even just one-way) in that time is essentially running right at the limits of physics/causality.
[deleted]
Out of curiosity, how is that measured across machines?
(The first thing that comes to my mind would be to use an oscilloscope with two probes, one to each machine, but I’m guessing that’s not it.)
Measure the round trip and divide by two for the approximate one way time. It'd be really neat to measure the time it takes for a packet to travel in one direction, but it's somewhere between hard and impossible[1]; a very short path has less room to be asymetric though.
[1] If the clocks are synchronized, you can measure send time on one end, and receive time on the other. But synchronizing clocks involves estimating the time it takes for signals to pass im each direction, typically assuming each direction takes half the round trip.
[deleted]
Meanwhile the best network I’ve ever benchmarked was AWS and measured about 55µs for a round trip!
What on earth are you using that gets you down to single digits!?
I assume 1-3 hops of modern switches without congestion. Given 100Gb/s lanes these numbers are possible if you get all the bottlenecks out of the way. The moment you hit a deep queue the latency explodes.
So, are you talking about theoretical latencies here based on bandwidths and cable lengths, or actual measured latencies end-to-end between hosts?
I know that "in principle" the physics of the cabling allows single digit microseconds, but I've never seen it anywhere near that low even with cross-over cables with zero switches in-path!
You need high bandwidth links (time to get the entire packet across starts to matter), run on bare metal (or have very well working HW virtualisation support), and tune NIC parameters and OS processing appropriately. But it's practically achievable.
Switches in these scenarios (e.g. 25GE DC targeted) are pretty predictable and add <1μs (unless misconfigured)
[deleted]
See https://irenezhang.net/papers/demikernel-sosp21.pdf for a more thorough paper on the Demikernel from 2021. There are some great ideas for improving the kernel interface while still allowing efficient DPDK-style pipelines.
This is a super cool idea, and it’s something that sounds fun to play with/try out.
Therefore, I eagerly await the inevitable influx of:
- “you don’t need it”
- “you’re not FAANG enough to justify it”,
-“seems overly complicated my Python-on-Ubuntu-is-good-enough and who needs more”
Style comments telling us why we shouldn’t have fun things like this.
Anyone got anymore comments to add to the bingo-card?
if you personally want to play with it, go ahead.
i think my personal feeling is that those sorts of comments you listed come out of the woodwork more when the comments section starts turning into an "oh man, this should be the standard for everyone" kind of discussion, which is never the case and is usually the point of those kinds of replies.
at least they are when i reply with those kinds of comments anyway
Preemptive cynicism is even worse than regular cynicism.
This is great! I think that there are a lot of latency sensitive applications which really do need to spare the kernel latency.
For a such an interface to be feasible to support in common open source infrastructure it needs a pure software implementation for testing and development purposes. Even better something along the lines of coz to even model performance by throttling down everything else proportionally.
It’s really frustrating that the HotOS paper itself has no details about the benchmarking, and the blog post just says “redis benchmark”. What was the system setup? Persistence options? What was ported to demikernel? The client writing, the server reading from the NIC? Based on the problem specified in the paper, I assume its reading from the NIC that was implemented in DemiOS
7-10us for what is a hashtable set/get is really, really bad
I can get a packet out to a switch and back to another machine and in 1-2us
Do you mean 1-2ms?
No, 1-2us is correct for that — in a datacenter, with cut-through switching.
That's really impressive. I need to update myself on this topic. Thanks.
In reality - with decent switches at 25g - and no fec - node to node is reliably under 300ns (0.3 us)
Considering that 300 light-nanoseconds is about 90m, getting a response (or even just one-way) in that time is essentially running right at the limits of physics/causality.
Out of curiosity, how is that measured across machines?
(The first thing that comes to my mind would be to use an oscilloscope with two probes, one to each machine, but I’m guessing that’s not it.)
Measure the round trip and divide by two for the approximate one way time. It'd be really neat to measure the time it takes for a packet to travel in one direction, but it's somewhere between hard and impossible[1]; a very short path has less room to be asymetric though.
[1] If the clocks are synchronized, you can measure send time on one end, and receive time on the other. But synchronizing clocks involves estimating the time it takes for signals to pass im each direction, typically assuming each direction takes half the round trip.
Meanwhile the best network I’ve ever benchmarked was AWS and measured about 55µs for a round trip!
What on earth are you using that gets you down to single digits!?
I assume 1-3 hops of modern switches without congestion. Given 100Gb/s lanes these numbers are possible if you get all the bottlenecks out of the way. The moment you hit a deep queue the latency explodes.
So, are you talking about theoretical latencies here based on bandwidths and cable lengths, or actual measured latencies end-to-end between hosts?
I know that "in principle" the physics of the cabling allows single digit microseconds, but I've never seen it anywhere near that low even with cross-over cables with zero switches in-path!
You need high bandwidth links (time to get the entire packet across starts to matter), run on bare metal (or have very well working HW virtualisation support), and tune NIC parameters and OS processing appropriately. But it's practically achievable.
Switches in these scenarios (e.g. 25GE DC targeted) are pretty predictable and add <1μs (unless misconfigured)
See https://irenezhang.net/papers/demikernel-sosp21.pdf for a more thorough paper on the Demikernel from 2021. There are some great ideas for improving the kernel interface while still allowing efficient DPDK-style pipelines.
This is a super cool idea, and it’s something that sounds fun to play with/try out.
Therefore, I eagerly await the inevitable influx of:
- “you don’t need it”
- “you’re not FAANG enough to justify it”,
-“seems overly complicated my Python-on-Ubuntu-is-good-enough and who needs more”
Style comments telling us why we shouldn’t have fun things like this.
Anyone got anymore comments to add to the bingo-card?
if you personally want to play with it, go ahead.
i think my personal feeling is that those sorts of comments you listed come out of the woodwork more when the comments section starts turning into an "oh man, this should be the standard for everyone" kind of discussion, which is never the case and is usually the point of those kinds of replies.
at least they are when i reply with those kinds of comments anyway
Preemptive cynicism is even worse than regular cynicism.
This is great! I think that there are a lot of latency sensitive applications which really do need to spare the kernel latency.