Agree with the other commenters that the title is a bit too dramatic. The content was well written and got the point across.
I still don’t have enough experience to have a strong opinion on Rust async, but some things did standout.
On the good side, it’s nice being able to have explicit runtimes. Instead of polluting the whole project to be async, you can do the opposite. Be sync first and use the runtime on IO “edges”. This was a great fit to a project that I’m working on and it seems like a pretty similar strategy to what zig is doing with IO code. This largely solved the function colloring problem in this particular case. Strict separation of IO and CPU bound code was a requirement regardless of the async stuff, so using the explicit IO runtime was natural.
On the bad side, it seems crazy to me how much the whole ecosystem depends on tokio. It’s almost like Java’s GC was optional, but in practice everyone just used the same third party GC runtime and pulling any library forced you to just use that runtime. This sort of central dependency is simply not healthy.
So depending on your context, it may seem like the whole ecosystem depends on tokio, but if you look at say, embedded Rust, it makes a little more sense.
The system requirements for an async runtime on a workstation processor compared to say, an RP2040 look very different. But given the ability to swap out the backend, when I write async IO code for a small ARM M0 microcontroller, that code looks almost identical to what I'd be writing outside that context, but with an embedded focused runtime, ie embassy.
I can focus less on the runtime specifics as they use the same traits and interfaces. Compare this with say, using a small RTOS or rolling your own async environment, it's quite nice.
Much of what I need to learn to write the async code in embassy can cross over to other domains.
What's the alternative? I'm happy to use tokio, but i'm happy other folks can enjoy other executors (smol, async-std, glommio, etc). I think the situation is OK because tokio is well-maintained, even though it's not part of the standard library, and i'm afraid making it part of the standard library would make it harder to use other executors, and harder to port the standard library to other platforms.
But maybe my fears are unfounded.
> What's the alternative?
Traits in the stdlib for common functionality like "spawn" (a task) and things like async timers. Then executors could implement those traits and libraries could be generic over them.
Yep. We could have a system like how there's a global system allocator, but you can override it if you want in your app.
We could have something similar for a global async executor which can be overridden. Or maybe you launch your own executor at startup and register it with std, but after that almost all async spawn calls go through std.
And std should have a decent default executor, so if you don't care to customise it, everything should just work out of the box.
Good point, but the devil lies in the details. How should the timers behave? Is the clock monotonic? Are tasks spawned on the same thread? Different platforms and executors have different opinions. Maybe it's still possible and just a lot of work?
>the devil lies in the details
This is true, but perhaps not uniquely so, when compared to platform dependence of the standard libary already. File semantics, sync primitive gaurantees and implementations, timers and timer resolutions, etc have subtle differences between platforms that the Rust stdlib makes no further gaurantees about.
> Maybe it's still possible and just a lot of work?
Yeah, I think that's the current status. I believe it was for a long time (and possibly still is) blocked on language improvements to async traits (which didn't exist at all until relatively recently and still don't support dyn trait).
It would make sense to have an official default async runtime in the standard library while keeping the door open to use any other runtime, just like we already have for the heap allocator or reference counting garbage collection.
There are issues in particular with core traits for IO or Stream being defined in third-party libraries like tokio, futures or its variants. I've seen many cases where libraries have to reexport such types, but they are pinned to the version they have, so you can end up with multiple versions of basic async types in the same codebase that have the same name and are incompatible.
As of now I don’t think there’s an alternative. I’m not a Rust expert but the core issue to me is that “async” goes beyond just having a Futures scheduler. Async stuff usually needs network, disk, os interaction, future utilities(spawn) and these are all things the runtime (tokio) provides. It’s pretty hard to be compatible with each other unless the language itself provides those.
That's not the core issue at all it's lifetimes and allocations.
Can you elaborate on this please? Do you mean that’s basically impossible for rust std to provide a default runtime that makes “everyone” (embedded on one end and web on the other) happy?
I think that's the problem in essence, yes. Different executors built on top of different primitives and having different executions strategies will have mutually incompatible constraints.
To spawn a future on tokio, it has to implement `Send`, because tokio is a work-stealing executor. That isn't the case for monoio or other non-work-stealing async executors, where tasks are pinned to the thread they are spawned on and so do not require `Send` or `Sync`, so you can use Rc/RefCell.
Moreover, the way that async executors schedule execution can be _different_. I have a small executor I made that is based on the runtime model of the JS event loop. It's single-threaded async, with explicit creation of new worker threads. That isn't a model that can "slot in" to a suite of traits that adequately represents the abstraction provided by tokio, because the abstraction of my executor and the way it schedules tasks are fundamentally different.
Any reasonably-usable abstraction for the concept of an async runtime would impose too many constraints on the implementation in the name of ensuring runtime-generic code can execute on any standard runtime. A Future, for better or worse, is a sufficiently minimal abstraction of async executability without assuming anything about how the polling/waking behavior is implemented.
Here are some alternatives for concurrent operations in rust that don't use Async. Which are available depend on the target, e.g. embedded/low-level vs GPOS. I use all of these across my Rust projects:
- Threads and thread pools
- CPU SIMD
- GPU
- DMA, with memory and/or dedicated hardcore
- Multiple cores, ICs, or MCUs
- Hardware interrupts
- Event loops
Most of you are already aware. I bring this up because I have observed that in the Rust OSS community (especially embedded) people sometimes refer to not using Async as blocking, and are not aware that Async isn't the only wya to manage concurrency. People new to it are learning it this way: "If you're not using Tokio or Embassy (Or some other executor), you are blocking a process."
That's kind of wild... I'm relatively novice with Rust still, but I was pretty aware that the different executors weren't the only async option... I thought it was pretty cool you could opt into tokio for the bulk of async request work, but if I wanted to use a pool for specific workers, or something else on a more monolithic service/application, I could still launch my own threads for that use case pretty easily.
The hardest parts for me to grok really came down to lifetime memory management, for example a static/global dictionary as a cache, but being able to evict/recover entries from that dictionary for expired data... This is probably the use case that IMO is one of the least well documented, or at least lacking in discoverable tutorials etc.
As you mentioned Java, it’s interesting to notice that it has had similar problems throughout its history: logging (now it’s settled on slf4j but you still find libraries using something else), commons (first Apache Commons, now Guava), JSON (it has settled on Jackson but things like Gson and Simple-json are not uncommon to see), nullability annotations ( first with unofficial distributions of JSR-305 which never became official, then checker framework , and lately with everything migrating to JSpecify). All this basic stuff needs to be provided by the language to avoid this fragmentation and quasi de facto libraries from appearing.
The traditional approach in Java has been to let those things happen in third party space, then form an expert group to standardise a shared API for them. That was done with XML parsers and ORM fairly successfully. It doesn't always work, as with your examples - there was an attempt with logging, but it was done badly, JSR-305 ran around, etc. But I think it's a much better approach than the JDK maintainers trying to get it right first time.
But this fragmentation is what needed to make good software. If you put things in the standard library you're just adding a +1 to the fragmented landscape because for instance it will never be specialized enough to cover all use cases, so people will still use their own libraries, just like for instance c++ has three dozen distinct implementations of hash maps just because one cannot fit all cases
It could also be argued that putting a specific executor model into the standard library will make the problem worse because it will give library crates license to use it without considering alternatives because it is standard. At least today taking a dependency on a specific runtime is a well-known boondoggle .
Not only that, but there kind of is a defacto standard (tokio), which is pretty much the default if you aren't in a specific, resource constrained use case.
commons, is something that is eventually being migrated into the main, at least those that are decided to be required for most projects. I don't use apache commons or guava at all in java (now at 25 or 26, depending on project) - there are still some libs that depend on those, but I would argue that most use it out of inertia, than actual need.
As for slf4j, I still don't see any justification for an abstraction layer on top of logging. I never, ever migrated from one logger to another, and even if I did need to do it - it is very easy as most loggers are very similar.
E.g. that's why I decided to use log4j2 in my latest project.
The logging implementation should be an application level decision. By using a facade like slf4j a library allows an application using any logging implementation to use it. That’s why libraries should use it.
It's very much possible to use rust for a lot of areas with async without needing to be dependent on tokio. I think it's really just the web/server stuff that's entirely tokio dependent. Writing libraries to be executor agnostic is not terribly difficult but does require some diligence which isn't necessarily present in most of the community.
It really depends on the abstraction model of the library. If the library needs to actually read/write a file, it either needs to depend on a runtime or provide some horrific abstraction over the process it will use to do that. This doesn't apply to sync IO libraries which can just use the Standard Library.
Web/server frameworks have to bind to a runtime because they have to make decisions about how to connect to a socket. Hyper is sufficiently abstract that it doesn't require any runtime, but using hyper directly provides no framework-like benefits and requires that you make those decisions and provide a compatible socket-like implementation for sending requests.
That's the thing though, it's possible but it makes the simple hello world example more tedious. It's totally possible to make an abstraction layer, provide a tokio implementation out of the box but leave the door open for other implementations to slot in. Anyone who's written portable code for non posix systems is used to this experience. Standardization is definitely better but it also has its own share problems as it can limit what's possible. I expect that the decision to delay standardizing on these interfaces too early will end up leading to a better long term design. Especially if major improvements to async are on the horizon and can alter the final shape of that standard.
Great article! Love these types of deep dives into optimizations. Hope the project goal works out!
I've felt before that compilers often don't put much effort into optimizing the "trivial" cases.
Overly dramatic title for the content, though. I would have clicked "Async Rust Optimizations the Compiler Still Misses" too you know
So on the title, I picked this because it's simply the truth. Since async landed in 2019 or so, not much has changed.
Yes, we can have async in traits and closures now. But those are updates to the typesystem, not to the async machinery itself.
Wakers are a little bit easier to work with, but that's an update to std/core.
As I understand it, the people who landed async Rust were quite burnt out and got less active and no one has picked up the torch. (Though there's 1 PR open from some google folk that will optimize how captured variables are laid out in memory, which is really nice to have)
Since I and the people I work with are heavy async users, I think it's maybe up to me to do it or at least start it. Free as in puppy I guess.
So yeah, the title is a little baitey, but I do stand behind it.
Some of the burnout no doubt being due to the catastrophizing of every decision by the community and the extreme rhetoric used across the board.
Great to see people wanting to get involved with the project, though. That’s the beauty of open source: if it aggravates you, you can fix it.
As an example of this, i remember a huge debate at the time about `await foo()` vs `foo().await` syntax. The community was really divided on that one, and there was a lot of drama because that's the kind of design decision you can't really walk back from.
Retrospectively, i think everyone is satisfied with the adopted syntax.
It makes sense that there was a huge debate, because the postfix .await keyword was both novel (no other languages had done it that way before) and arguably the right call. Of course, one can argue that the ? operator set a relevant precedent.
> Retrospectively, i think everyone is satisfied with the adopted syntax.
Maybe it’s a case of agree and commit, since it can’t really be walked back.
Various prominent people have said years after that .await was the correct choice after all
I'm not prominent but I disagreed with it at the time and I was wrong.
I’m curious - why were you wrong? It still seems like a wart to me, all these years later. What am I missing?
Contrast it with async in JS/ES as an example... now combine it with the using statement for disposeAsync instances.
await using db = await sqlite.connect(await ctx.getConfig("DB_CONN"));
It's not so bad when you have one `await foo` vs `foo.await`, it's when you have several of them on a line in different scopes/contexts.
Another one I've seen a lot is...
const v = await (await fetch(...)).json();
Though that could also be...
const v = await fetch(...).then(r => r.json());
In any case, it still gets ugly very quickly.
I’ve never in my life used JS, so I’ll have to take your word for it.
It's a language I'm familiar with that uses the `await foo` syntax and often will see more than one in a line, per the examples given. C# is the most prominent language that has similar semantics that I know well, but is usually less of an issue there.
I'll give my two cents here. I work with Dart daily, and it also uses the `await future` syntax. I can cite a number of ergonomic issues:
```dart
(await taskA()).doSomething()
(await taskB()) + 1
(await taskC()) as int
```
vs.
```rust
taskA().await.doSomething()
taskB().await + 1
taskC().await as i32
```
It gets worse if you try to compose:
```dart
(await taskA(
(await taskB(
(await taskC()) as int
)) + 1)
).doSomething()
```
This often leads to trading the await syntax for `then`:
```dart
await taskC()
.then((r) => r as i32)
.then(taskB)
.then((r) => r + 1)
.then(taskA)
.then((r) => r.doSomething())
```
But this is effectively trading the await structured syntax for a callback one. In Rust, we can write it as this:
```rust
taskA(taskB(taskC().await as i32).await + 1).await.doSomething()
```
Two spaces before a line make it a code block literal
This is a code block
HN has never used markdown so the triple-tick does nothing but create noise here.
I think it's partially accurate, and partially a consequence of how async fractures the design space, so it will always feel like a somewhat separate thing, or at least until we figure out how to make APIs agnostic to async-ness.
I am a beginner to Rust but I've coded with gevent in Python for many years and later moved to Go. Goroutines and gevent greenlets work seamlessly with synchronous code, with no headache. I know there've been tons of blog posts and such saying they're actually far inferior and riskier but I've really never had any issues with them. I am not sure why more languages don't go with a green thread-like approach.
Because they have their own drawbacks. To make them really useful, you need a resizable stack. Something that's a no-go for a runtime-less language like Rust.
You may also need to setup a large stack frame for each C FFI call.
Rust originally came with a green thread library as part of its primary concurrency story but it was removed pre-1.0 because it imposed unacceptable constraints on code that didn’t use it (it’s very much not a zero cost abstraction).
As an Elixir + Erlang developer I agree it’s a great programming model for many applications, it just wasn’t right for the Rust stdlib.
One of Rust's central design goals is to allow zero cost abstractions. Unifying the async model by basically treating all code as being possibly async would make that very challenging, if not impossible. Could be an interesting idea, but not currently tenable.
One problem I have with systems like gevent is that it can make it much harder to look at some code and figure out what execution model it's going to run with. Early Rust actually did have a N:M threading model as part of its runtime, but it was dropped.
I think one thing Rust could do to make async feel less like an MVP is to ship a default executor, much like it has a default allocator.
They could still come in a step short of default executor and establish some standard traits/types that are typical across executors.
By providing a default, I think you're going to paint yourself into a corner. Maybe have one of two opt-in executors in the box... one that is higher resource like tokio and one that is meant for lower resource environments (like embedded).
[deleted]
As an uninterested 3rd party, it’s a wild exaggeration
Agree on title. Too dramatic.
The author seems to be obsessing about the overhead for trivial functions. He's bothered by overhead for states for "panicked" and "returned". That's not a big problem.
Most useful async blocks are big enough that the overhead for the error cases disappears.
He may have a point about lack of inlining. But what tends to limit capacity for large numbers of activities is the state space required per activity.
> Most useful async blocks are big enough that the overhead for the error cases disappears.
Is it really though?
In my experience many Rust applications/libraries can be quite heavy on the indirection. One of the points from the article is that contrary to sync Rust, in async Rust each indirection has a runtime cost. Example from the article:
I would naively expect the above to be a 'free' indirection, paying only a compile-time cost for the compiler to inline the code. But after reading the article I understand this is not true, and it has a runtime cost as well.
In my experience, it's not uncommon to have an async trait method for which many implementations are actually synchronous. For example, different tables in your DB need to perform some calculations, but only some tables reference other tables. In that case, the method needs to be async and take a handle to the DB as parameter, but many table entries can perform the calculation on their own without using the handle (or any async operation).
This may look like a case of over-optimization, but given how many times i've seen this pattern, i assume it builds up to a lot of unnecessary fluff in huge codebases. To be clear, in that case, the concern is not really about runtime speed (which is super fast), but rather about code bloat for compilation time and binary size.
> Most useful async blocks are big enough that the overhead for the error cases disappears.
Most useful async blocks are deeply nested, so the overhead compounds rapidly. Check the size of futures in a decently large Tokio codebase sometime
He's optimizing for embedded no-std situation. These things do matter in constrained environments.
He briefly mentions microcontrollers, but doesn't go into details. That might make sense, for microcontrollers which implement some kind of protocol talking to something.
> [...] That's not a big problem [...]
Depends somewhat on your expectations, I suppose. Compared to Python, Java, sure, but Rust off course strives to offer "zero-cost" high level concepts.
I think the critique is in the same realm of C++'s std::function. Convenience, sure, but far from zero-cost.
To the point it got replaced by std::function_ref() in C++26.
Exactly. And I guess that is also the gist of the article: async Rust needs additional TLC.
> Agree on title. Too dramatic.
not just too dramatic
given that all the things they list are
non essential optimizations,
and some fall under "micro optimizations I wouldn't be sure rust even wants",
and given how far the current async is away from it's old MVP state,
it's more like outright dishonest then overly dramatic
like the kind of click bait which is saying the author does cares neither about respecting the reader nor cares about honest communication, which for someone wanting to do open source contributions is kinda ... not so clever
through in general I agree rust should have more HIR/MIR optimizations, at least in release mode. E.g. its very common that a async function is not pub and in all places directly awaited (or other wise can be proven to only be called once), in that case neither `Returned` nor `Panicked` is needed, as it can't be called again after either. Similar `Unresumed` is not needed either as you can directly call the code up to the first await (and with such a transform their points about "inlining" and "asyncfns without await still having a state machine" would also "just go away"TM, at least in some places.). Similar the whole `.map_or(a,b)` family of functions is IMHO a anti-pattern, introducing more function with unclear operator ordering and removal of the signaling `unwrap_` and no benefits outside of minimal shortening a `.map(b).unwrap_or(a)` and some micro opt. is ... not productive on a already complicated language. Instead guaranteed optimizations for the kind of patterns a `.map(b).unwrap_or(a)` inline to would be much better.
Async seems like an underbaked idea across the board. Regular code was already async. When you need to wait for an async operation, the thread sleeps until ready and the kernel abstracts it away. But We didn’t like structuring code into logical threads, so we added callback systems for events. Then realized callbacks are very hard to reason about and that sequential control is better.
So threads was the right programming model.
Now language runtimes prefer “green threads” for portability and performance but most languages don’t provide that properly. Instead we have awkward coloring of async/non-async and all these problems around scheduling, priority, and no-preemption. It’s a worse scheduling and process model than 1970.
> Regular code was already async. When you need to wait for an async operation, the thread sleeps until ready and the kernel abstracts it away
Not really. I’ve observed async code often is written in such a way that it doesn’t maximize how much concurrency can be expressed (eg instead of writing “here’s N I/O operations to do them all concurrently” it’s “for operation X, await process(x)”). However, in a threaded world this concurrency problem gets worse because you have no way to optimize towards such concurrency - threads are inherently and inescapably too heavy weight to express concurrency in an efficient way.
This is is not a new lesson - work stealing executors have long been known to offer significantly lower latency with more consistent P99 than traditional threads. This has been known since forever - in the early 00s this is why Apple developed GCD. Threads simply don’t provide any richer information it needs in the scheduler to the kernel about the workload and kernel threads are an insanely heavy mechanism for achieving fine grained concurrency and even worse when this concurrency is I/O or a mixed workload instead of pure compute that’s embarrassingly easily to parallelize.
Do all programs need this level of performance? No, probably not. But it is significantly more trivial to achieve a higher performance bar and in practice achieve a latency and throughput level that traditional approaches can’t match with the same level of effort.
You can tell async is directionally kind of correct in that io_uring is the kernel’s approach to high performance I/O and it looks nothing like traditional threading and syscalls and completion looks a lot closer to async concurrency (although granted exploiting it fully is much harder in an async world because async/await is an insufficient number of colors to express how async tasks interrelate)
> work stealing executors have long been known to offer significantly lower latency with more consistent P99 than traditional threads. This has been known since forever - in the early 00s
Well, we know how to make "traditional threads" fast, with lower latency and more consistent P99 since forever^2, in the early 90s. [1]
Sure, we can't convince that Finnish guy this is worthwhile to include in THE kernel, despite similar ideas had been running in Google datacenters for idk how many years, 15 years+? But nothing stops us from doing it in the userspace, just as you said, a work stealing executor. And no, no coloring.
Stack is all you need. Just make your "coroutines" stackful. Done. All those attempts trying to be "zero-cost" and change programming model dramatically to avoid a stack, introduced much more overhead than a stack and a piece of decent context switch code.
> You can tell async is directionally kind of correct in that io_uring is the kernel’s approach
lol, it is very hard to model anything proactor like io_uring with async Rust due to its defects.
There are always trade-offs and there is never one best way to do something.
Stack-based coroutines is one way to do it. A relevant trade-off here is overhead, requiring a runtime and narrowing the potential use-cases this can serve (i.e embedded real-time stuff).
If you don’t care about supporting such use cases you can of course just create a copy of goroutines and be pretty happy with the result.
[deleted]
I am not saying threads are the model for all programming problems. For example a dependency graph like an excel spreadsheet can be analyzed and parallelized.
But as you observed, async/await fails to express concurrency any better. It’s also a thread, it’s just a worse implementation.
That’s incorrect. Even when expressed suboptimally, it still tends to result in overall higher throughput and consistently lower latency (work stealing executors specifically). And when you’re in this world, you can always do an optimization pass to better express the concurrency. If you’ve not written it async to start with, then you’re boned and have no easy escape hatch to optimize with.
Why can’t you do the same optimization? Are you maxing out you OS system resources on thread overhead?
That’s part of it. Then you add a thread pool to dispatch your tasks into to mitigate the cost of a thread start. Then you run into blocking problems and are like “I wish I had some keyword to express when a function needed to be run on the thread pool”. Then you’ve done a speed run of the past 40 years of research.
The 40 years of research was actually in OS theory so that you could write normal code and async was abstracted away.
A thread pool is not a research project.
OS thread overhead can be pretty substantial. Starting new threads on Windows is especially expensive.
> threads are inherently and inescapably too heavy weight to express concurrency in an efficient way
Your premise is wrong. There are many counterexamples to this.
Can you explain more ? I always heard this.
The most promiment example is probably Go with its goroutines, but there are so many more. You can easily spawn tens of thousands of goroutines, with low overhead and great performance.
Goroutines/"fibers"/"green threads" are usually scheduled by the runtime system across a small pool of actual OS threads.
The word "thread" is confusing things. In computer science a thread represents a flow of execution, which in concrete terms where execution is a series of function calls, is typically a program counter and a stack.
There are many ways to implement and manage threads. In Unix-like and Windows systems a "thread" is the above, plus a bunch of kernel context, plus implicit preemptive context switching. Because Unix and Windows added threads to their architectures relatively late in their development, each thread has to behave sort of like its own process, capable of running all the pre-existing software that was thread-agnostic. Which is why they have implicit scheduling, large userspace stacks, etc.
But nothing about "thread" requires it to be implemented or behave exactly like "OS threads" do in popular operating systems. People wax on about Async Rust and state machines. Well, a thread is already state machine, too. Async Rust has to nest a bunch of state machine contexts along with space for data manipulated in each function--that's called a stack. So Async Rust is one layer of threading built atop another layer of threading. And it did this not because it's better, but primarily because of legacy FFI concerns and interoperability with non-Rust software that depended on the pre-existing ABIs for stack and scheduling management.
Go largely went in the opposite direction, embracing threads as a first-class concept in a way that makes it no less scalable or cheap than Rust Futures, notwithstanding that Go, too, had to deal with legacy OS APIs and semantics, which they abstracted and modeled with their G (goroutine), M (machine), P (processor) architecture.
I thought it was obvious from context: OS threads are too heavyweight for fine grained concurrency
Go uses userspace threads. It’s also interesting that Go and Java are the only mainstream languages to have gone this route. The reason is that it has a huge penalty when calling FFI of code that doesn’t use green threads whereas this cost isn’t there for async/await.
Also that you have to rewrite the entire standard library, because the kernel knows how to suspend kernel threads on syscalls, but not green threads. (Go and Java already had to do this anyway, of course.)
> the thread sleeps until ready and the kernel abstracts it away.
Sure, but once you involve the kernel and OS scheduler things get 3 to 4 orders of magnitude slower than what they should be.
The last time I was working on our coroutine/scheduling code creating and joining a thread that exited instantly was ~200us, and creating one of our green threads, scheduling it and waiting for it was ~400ns.
You don't need to wait 10 years for someone else to design yet another absurdly complex async framework, you can roll your own green threads/stackful coroutines in any systems language with 20 lines of ASM.
1. Why can’t we have better green threads implementations with better scheduling models?
2. Unchecked array operations are a lot faster. Manual memory management is a lot faster. Shared memory is a lot faster.
Usually when you see someone reach for sharp and less expressive tools it’s justified by a hot code path. But here we jump immediately to the perf hack?
3. How many simultaneous async operations does your program have?
Well, if you offload heavy compute into an async task, then usually it depends strictly on how many concurrent inputs you are given. But even something as “simple” as a performance editor benefits from this if done well - that’s why JS text editors have reasonably acceptable performance whereas Java IDEs always struggled (historically anyway since even Java has adopted green threads).
Are you sure Java's UI issues are caused by threading and not just Swing being a glitchy pile of junk?
For example, if you don't explicitly call the java.awt.Toolkit.sync() method after updating the UI state (which according to the docs "is useful for animation"), Swing will in my experience introduce seemingly random delays and UI lag because it just doesn't bother sending the UI updates to the window system.
only netbeans is written in swing . Eclipse and Jetbrains use their own thing and still generally struggled.
No, JetBrains use Swing in IntelliJ IDEA. You can tell from how it (for example) fails to layout dialogs correctly the first time they're displayed, just like every other Swing application. And how windows have no minimum size because Swing doesn't expose that functionality. And the various baffling bugs involving window focus that are inherent to Swing applications.
Eclipse uses SWT instead, which wraps the platform's native widgets.
You think IDEs are written in JS because of the performance benefits of the threading model?
I thought it was because they could copy chromium.
Why do you think they don’t struggle with input latency? Because the non blocking nature built into the browser model is so powerful and you cannot get that with threads.
I disagree with the premise. I cannot imagine a better latency experience than blocking loop IDEs like VS6.
Which inputs are getting latency? The keyboard? The files?
Hate to break it to you but windows gui programming, emblemified by VS6, is about as far away from a blocking threaded model as you get. You literally have a UI event loop and any compute intensive work is meant to be offloaded to other threads via messages/COM. This is why when they failed to do that correctly the entire UI would lock up - because they didn’t have good hygiene around how to offload compute intensive operations that also interacted with the GUI.
You’ve literally argued against yourself without realizing.
Wait which programming model are you arguing is the low latency one? I thought you said it was JS because non-blocking.
Are you sure that latency-sensitive parts are written in async JS instead of having a separate UI thread (pool)? I have no idea myself, but without knowing the details it's hard to argue. Note, that browsers themselves, are usually written in languages like C++ or Rust. They run JS, but aren't written in it
Yes they are, the UI layer is mostly JS, outside the rendering and layout engines.
If you implement threads and code that reacts to an input queue (e.g. PostMessage, queue_push, mq_send, ...), you've implemented (probably a bad version of) async threads. And yes, that's exactly what Windows 1.0 did and what made it great.
But God help you if you have to change the code. Async threads are a way to organize it and make it workable for humans.
Maybe you remember performance of IDEs from 15 years ago because that definitely isn't my experience.
> that’s why JS text editors have reasonably acceptable performance
Absolutely not
You involve the kernel also when you are doing async io.
In this context the interesting thing to measure would be doing IO in your green threads vs OS threads.
A stronger theoretical performance argument for async io is that you can do batching, ala io_uring, and do fewer protection domain crossings per IO that way.
> So threads was the right programming model.
It depends on what you are doing. Threads are the right model for compute-bound workloads. Async is the right model for bandwidth-bound workloads.
Optimization of bandwidth-bound code is an exercise in schedule design. In a classic multithreading model you have limited control over scheduling. In an async model you can have almost perfect control over scheduling. A well-optimized async schedule is much faster than the equivalent multithreaded architecture for the same bandwidth-bound workload. It isn't even close.
Most high-performance code today is bandwidth-bound. Async exists to make optimization of these workloads easier.
If this is a classic exercise can you show me the material?
Why can’t a scheduler be written which optimizes around IO? What additional information is present in code that has async/await annotations?
Threads are a scheduling model that delegates to the OS scheduler. Async style provides a primitive for creating a custom scheduler but is not a scheduler per se.
To use a custom scheduler you must first disable the existing schedulers your code is using by default for both execution and I/O. That means no OS scheduling. Thread-per-core architectures with static allocation and direct userspace I/O is the idiomatic way to do this regardless of programming language.
Optimal scheduling is a profoundly intractable problem -- it is AI-Complete. A generic scheduler is always going to be deeply suboptimal because a remotely decent schedule isn't practically computable in real systems. A more optimal scheduler must continuously rewrite the selection and ordering of thousands of concurrent operations in real-time. Importantly, this dynamic schedule rewriting is based on a model that can see across all operations globally and accurately predict both future operations that haven't happened yet and any ordering dependencies between current and future operations. A modern system can handle tens of millions of these operations per second, so the scheduling needs to be efficient.
A generic scheduler has to allow for almost arbitrary operation graphs and behavior. However, if you are writing e.g. a database engine, you have almost the entire context of how operations relate to each other both concurrently and across time. The design of a somewhat optimal scheduler that only understands your code becomes computationally feasible. It isn't trivial -- scheduler design is properly difficult -- but you build it using async style.
That’s not what I asked.
I believe that's actually how the virtual threads in the newer Java works. It's smart enough to notice IO and properly park it and move to another thread.
I think it's still basically doing epoll behind the scenes [1], but you have straightforward sequential code in the process and the actual implementation is invisible to the user, and you can use old boring blocking code with an object that is a drop-in replacement for Thread.
I personally still kind of prefer the explicit async stuff with Futures and Vert.x since I kind of like the idea that async is encoded into the type itself so you're more directly aware of it, but I'm definitely an outlier for that.
[1] Genuinely, please correct me if I'm wrong, it's very possible that I am.
I think that callbacks are actually easier to reason about:
When it comes time to test your concurrent processing, to ensure you handle race conditions properly, that is much easier with callbacks because you can control their scheduling. Since each callback represents a discrete unit, you see which events can be reordered. This enables you to more easily consider all the different orderings.
Instead with threads it is easy to just ignore the orderings and not think about this complexity happening in a different thread and when it can influence the current thread. It isn't simpler, it is simplistic. Moreover, you cannot really change the scheduling and test the concurrent scenarios without introducing artificial barriers to stall the threads or stubbing the I/O so you can pass in a mock that you will then instrument with a callback to control the ordering...
The problem with callbacks is that the call stack when captured isn't the logical callstack unless you are in one of the few libraries/runtimes that put in the work to make the call stacks make sense. Otherwise you need good error definitions.
You can of course mix the paradigms and have the worst of both worlds.
I agree. I don’t think callbacks are an underbaked language feature.
The problem comes from trying to sit on both chairs: we want async but want to be able to opt out. This is what causes most of the ugliness, including function colouring. Just look at golang, where everything is async with no way to change it, it's great. It's, probably, not well-suited for things like microcontrollers, where every byte matters, but if you can afford the overhead, it's so much better than Rust async. Before async Rust was an interesting and reasonable language, now it's just a hot mess that makes your eyes bleed for no reason.
> It's, probably, not well-suited for things like microcontrollers, where every byte matters, but if you can afford the overhead, it's so much better than Rust async.
There is one hill I'll die on, as far as programming languages go, which is that more people should study Céu's structured synchronous concurrency model. It specifically was designed to run on microcontrollers: it compiles down to a finite state machine with very little memory overhead (a few bytes per event).
It has some limitations in terms of how its "scheduler" scales when there are many trails activated by the same event, but breaking things up into multiple asynchronous modules would likely alleviate that problem.
I'm certain a language that would suppprt the "Globally Asynchronous, Locally Synchronous" (GALS) paradigm could have their cake and eat it too. Meaning something that combines support for a green threading model of choice for async events, with structured local reactivity a la Céu.
F'Santanna, the creator of Céu, actually has been chipping away at a new programming language called Atmos that does support the GALS paradigm. However, it's a research language that compiles to Lua 5.4. So it won't really compete with the low-level programming languages there.
If your threads are "free" you can just run 400 copies of a synchronous code and blocking in one just frees the thread to work on other. async within same goroutine is still very much opt in (you have to manually create goroutine that writes to channel that you then receive on), it just isn't needed where "spawn a thread for each connecton" costs you barely few kb per connection.
What GP meant - what everyone means when they say this - is that goroutines are always M:N threading and so there is no such thing as function coloring. In Rust to get M:N threading you have to use async and in practice every library you use has to use async. Hence function coloring, and two separate ecosystems of libraries in the same language.
> not well-suited for things like microcontrollers, where every byte matters
except when a RAM fetch is so expensive a load is basically an async call - and it's a single machine code instruction at the same time
Threads are neither better or worse than async+callbacks. They are different. There are problems which map nicely to threads and there are problems which are much nicer to express with async.
Such as? The entire premise of async is that callbacks were a mistake because they broke sequential reasoning and control.
Every explanation of the feature starts with managing callback hell.
Beware, they are different concepts.
Threads offer concurrent execution, async (futures) offer concurrent waiting. Loosely speaking, threads make sense for CPU bound problems, while async makes sense for IO bound problems.
Why? You write the same code with async await but with a keyword at the beginning of every function.
Because if you go down the callstack eventually you won't get the await keyword anymore; you'll get the actual 'waiters' and 'wakers' which define your scheduling
Yeah. The OS handles scheduling and preemption so it’s done for you rather than a call in the stack.
Async/await implementations usually also come with a runtime to handle the work scheduling as well as manage thread context. You can say that you can do that with just threads and callbacks but that's also essentially implementing async/await.
The entire premise of callbacks is that threads were a mistake because they broke sequential reasoning and control.
JK, obviously callbacks became prominent as a result of folks looking for creative solutions to the C10K[0] problem, but threads have a long history of haters[1][2][3].
The callbacks should be just hidden from programmer, that's what async/await are for.
> So threads was the right programming model.
For problems that aren't overly concerned with performance/memory, yes. You should probably reach for threads as a default, unless you know a priori that your problem is not in this common bucket.
Unfortunately there is quite a lot of bookkeeping overhead in the kernel for threads, and context switches are fairly expensive, so in a number of high performance scenarios we may not be able to afford kernel threading
In that sentence I’m referring to the abstract idea of a thread of execution as a model of programming, not OS threads. A green thread implementation could do it too.
But what you said about kernel implementation is true. But are we really saying that the primary motivation for async/await is performance? How many programmers would give that answer? How many programs are actually hitting that bottleneck?
Doesn’t that buck the trend of every other language development in the past 20 years, emphasizing correctness and expressively over raw performance?
> But are we really saying that the primary motivation for async/await is performance?
Of course - what else would it be? The whole async trend started because moving away from each http request spawning (or being bound to) an OS thread gave quite extreme improvements in requests/second metrics, didn't it?
I agree. Managing many http requests or responses was a motivating problem.
What I question is whether 1. Most programs resemble that, so that they make it an invasive feature of every general purpose language. 2. Whether programmers are making a conscious choice because they ruled out the perf overhead of the simpler model we have by default.
That is why we have the function colouring problem and a split ecosystem in the first place - if it were obviously better in all cases, we'd make async the default, and get rid of the split altogether (and there are languages, like Erlang, that fall on this side of the fence)
It was not for performance reasons, but for scaling up.
That's the same thing?
> But are we really saying that the primary motivation for async/await is performance?
The original motivation for not using OS threads was indeed performance. Async/await is mostly syntax sugar to fix some of the ergonomic problems of writing continuation-based code (Rust more or less skipped the intermediate "callback hell" with futures that Javascript/Python et al suffered through).
In some languages, yes, in others (js/python) async is just workaround about not having proper threading.
Python used multiple threads to handle I/O long before async/await was a glimmer in anyone's mind (despite the GIL). nodejs is one of the very few languages I can think of that was born single-threaded and used an asynchronous runtime from the get-go
Importantly though, performance might be worse depending on use case and program. Specifically with scheduling in user space it can negatively impact branch prediction as your CPU is already hyper optimized for doing things differently.
It's all nuanced and what to choose requires careful evaluation.
As I understand, "green threads" are also expensive, for example you either need to allocate a large stack for each "thread", or hook stack allocation to grow the stack dynamically (like Go does), and if you grow the stack, you might have to move it and cannot have pointers to stack objects.
Green threads are fine for large servers with memory overcommit. Even with static stack sizes, you get benefits over OS threads due to the simpler scheduling. But the post was about embedded and green threads really suck there. Only using as much stack as you need for the task is the perfect solution for embedded systems.
>and if you grow the stack, you might have to move it
Most stacks are tiny and have bounded growth. Really large stacks usually happen with deep recursion, but it's not a very common pattern in non-functional languages (and functional languages have tail call optimization). OS threads allocate megabytes upfront to accommodate the worst case, which is not that common. And a tiny stack is very fast to copy. The larger the stack becomes, the less likely it is to grow further.
>cannot have pointers to stack objects
In Go, pointers that escape from a function force heap allocation, because it's unsafe to refer to the contents of a destroyed stack frame later on in principle. And if we only have pointers that never escape, it's relatively trivial to relocate such pointers during stack copying: just detect that a pointer is within the address range of the stack being relocated and recalculate it based on the new stack's base address.
works fine in Go.
Yes, you're not getting Rust performance (tho good part of it is their own compiler vs using all LLVM goodness) but performance is good enough and benefits for developers are great, having goroutines be so cheap means you don't even need to do anything explicitly async to get what you want
Rust chose a different design space for their async implementation though, so what works well for Go wouldn't work well for Rust. In particular, the Rust devs wanted zero-cost FFI that external code doesn't need to know about, which precludes Go-like green threads.
That immediately falls apart if you want to attempt the extremely common pattern of wait free usage of a main/UI thread.
Awaiting allows you to efficiently yield the thread to other tasks instead of blocking it. That's one of its biggest advantages.
When you block the OS does the same thing - yields to other threads.
Yes, and it is extremely expensive. This is a well-known design problem in database engines.
The computational cost of context-switching threads at yield points is often many times higher than the actual workload executed between yield points. To address this you either need fewer yield points, which reduces concurrency, or you need to greatly reduce the cost of yielding. An async architecture reduces the cost of yielding by multiple orders of magnitude relative to threads.
And how much slower is that? What happens when I run a thousand async tasks? I'll give you a hint, with async/await, it has barely any overhead.
Proper modern languages offer both, you can keep your threads and reach out to async only when it makes sense to do.
Now the languages that don't offer choice is another matter.
I’m just waiting for them to try co-operative multithreading again.
That's what async/await is, no? Yielding by awaiting is co-operative.
You don’t have threads on embedded, but you want a way to express concurrent waiting. Different problems altogether
You can, though. We used pthreads (well, pthresd compatible API) in production at massive scale on the ESP32-S3.
I think you are correct, in so far that often N:M threading is overkill for the problem at hand. However, some IO bound problems truly do require it. I haven't kept up with the details, but AFAIK the fallout from Spectre and Meltdown also means context switches are more expensive than they were historically, which is another downside with regular threads.
I also want to address something that I've seen in several sub-threads here: Rust's specific async implementation. The key limitation, compared to the likes of Go and JS, is that Rust attempts to implement async as a zero-cost abstraction, which is a much harder problem than what Go and JS does. Saying some variant of "Rust should just do the same thing as Go", is missing the point.
I think rust didn’t need async at all.
The question then becomes what, if anything, should take its place, and what are the corresponding tradeoffs?
What is kernel in this context?
> Now language runtimes prefer “green threads” for portability and performance
"Green threads" only exist in crappy interpreted languages, and only because they have stop-the-world single-threaded garbage collection.
Go and Java both have green threads, and are not interpreted nor limited to single threaded GC.
I don’t understand why Rust even has panics if its primary goal is safety. We should be able to prove that the code has no paths that may panic ever. I’ve been looking at this all week. It’s very difficult to make a program that is guaranteed not to panic. My understanding is that the panic handler is about 300kb, and the only way to exclude it is if your code has no paths that can panic when it compiles. And after it compiles you can check the binary to see if the panic handler was included. It’s hacky.
Yes you can lint out unwraps and other panic operations, but if there was a subset of no-panic rust a large part of the issue detailed in this post goes away. But it’s frustrating working with a language that has so many operations that can, in theory, panic even if in practice they should only do so if a bit flips. Like a proving an array is non-empty or working with async. You either end up with a lot of error handling for situations which will never happen or really strange patterns like non-empty list pattern (structure with first field and then your list). Which of course ends up adding its own bloat.
> I don’t understand why Rust even has panics if its primary goal is safety. We should be able to prove that the code has no paths that may panic ever. I’ve been looking at this all week. It’s very difficult to make a program that is guaranteed not to panic.
The Rust-in-Linux folks are working on this with things like failable memory operations. It's required for their own use. Increased use of proof (such as proving that an array is non-empty) is also being slowly worked on.
> I don’t understand why Rust even has panics if its primary goal is safety.
Rust's goal is memory safety. Panics are perfectly memory safe.
Panicking is fairly important for ergonomics and safety. If panicking wasn’t available and execution had to proceed in all situations, recovering from a situation like memory corruption where invariants have been violated would require a lot of error handling anywhere an invariant is checked. This is exactly the sort of large amounts of error handling for situations that will almost certainly never arise than you are concerned about.
The OS running the program isn't even perfect.
I tire so much of complainers who want someone else to make all their tools infallible yet want to do nothing. Let's just full-stop there. They not only want to avoid working on the tools. They prefer if the tool does everything for them, and they prefer having things done for them without bound.
Complainers want easy APIs. When the API isn't easy enough, they want easy Kubernetes containers "programmed" by YAML. When that isn't easy enough, it's all point-and-click hosted services on GCP and Amazon. You people don't want to program. You want apps. Infallable apps. You want to be consumers, fed from the sky like little birds who endeavor only never to fledge, never to fly. And you want to pay nothing for it.
The secret you people need to figure out is that the lifestyle you think is sustainable is actually a commensal relationship with people building things for you. There is no vast alliance to wrest power from corporations, to dissolve capitalism, no grass roots movement to "shake things up." There is food falling from higher in the water column from an ecosystem filled with people who do things. Those above do not have time to look down, but if they did, all they would feel is overwhelming contempt, so they only look across at the horizon.
But why do people seek to confirm comments like this? Because Rust scary. Churn on, little ant mill. Let be free any who understand the pointlessness of this performance.
I recently started working with Rust async. The main issue I am currently facing is code duplication: I have to duplicate every function that I want to support both asynchronous and blocking APIs. This could be great to have a `maybe-async`. I took a look at the available crates to work around this (maybe-async, bisync), but they all have issues or hard limitations.
There is work happening on keyword generics[0], which would let a function be generic over keywords like `async` and `const`.
For now the best option to write code that wants to live in both worlds is sans-io. Thomas Eizinger at Fireguard has written a good article about this[1] pattern. Not only does it nicely solve the sync/async issue, but it also makes testing easier and opens the door to techniques like DST[2]
I have my own writing on the topic[3], which highlights that the problem is wider than just async vs sync due to different executors.
Keyword generics are probably not happening because it's kinda a hack.
Algebraic effects are the way forward, but that's a long way off.
Yes I hope in the future we can get to what OCaml 5 has with their algebraic effects system, and hopefully fix any flaws we see in there, so that async will just be syntactic sugar over the underlying effects system.
Considering the latest commits and issues in effects-initiative are about 2 years old, the keyword generics initiative seems effectively dead.
Rust uses Zulip for lang-related discussions. The 't-lang/effects' channel is still somewhat active.
I may have missed something, but how does “sans-io” deal with CPU heavy code? For example, if there’s some heavy decoding/encoding required on the data? Does the event loop only drive the network side and the heavy part is done after the loop is finished?
This is a great question and there isn't a definitive answer provided in the sources I linked.
Broadly I think there are three approaches:
1. For frequent and small CPU heavy tasks, just run them on the IO threads. As long as you don't leave too long between `.await` points (~10ms) it seems to work okay.
2. Run your sans-io code on a dedicated CPU thread and do IO from an async runtime. This introduces overhead that needs to be weighed against the amount of CPU work.
3. Have the sans-io code output something like `Output::DoHeavyCompute { .. }` and later feed the result back as `Input::HeavyComputeResult { .. }`, in the middle run the work on a thread pool.
You won't get any benefits using async with CPU heavy code. Quite the opposite really.
It'll depend immensely on what you're actually doing, but if it's simple enough you may be able to make a macro that subs out the types & awaits
One of the issue I face is a blocking function that takes a generic constrained by a `trait` and its async version takes a generic constrained by an `async trait`.
In my perspective, an "async" function is already an "maybe-async". The distinction between a a `fn -> void` and `fn -> Future<void>` is that the former executes till its end immediately, whereas the other may only finish at another time. If you want to run an async fn in a blocking manner, you would use a blocking executor.
This is the type of ugly but necessary discussions that have been happening in c++ for a while.
I never really liked the viral nature of async in rust when it was introduced.
I wish rust the best of luck and with more people like this rust could have a brighter future.
I like this article already because it took me to the goals of Rust for 2026. We use the language in our team, but we haven't needed to go very deep to do the stuff we need. Yet, I really enjoy witnessing the development of a language from ground up with so much community feedback.
I somehow miss noticing that in C++ and I have no idea how it is working in other domains.
My only gripe is that a lot of it is feeling a bit kick-starter-y, with each of the goals needing specific funding. Is that the best model we've found so far?
> My only gripe is that a lot of it is feeling a bit kick-starter-y
IMO the term "project goals" is quite misleading for what this actually is. A project goal is a system for one person (or a small group of people) to express that they'd like to work on something and ask for Rust project volunteers to commit ongoing time and effort to supporting them through code review, answering questions, etc. It doesn't mean that the Rust project itself has set the goal, or even necessarily endorsed it.
So it's not quite right to treat it as a formal roadmap for Rust, just a "there are some contributors interested in working on these areas".
> I somehow miss noticing that in C++ and I have no idea how it is working in other domains.
There seems to be some consensus even within the C++ ISO committee that the evolution process of that language is somewhat broken, mostly due to its size and the way it is organized.
> My only gripe is that a lot of it is feeling a bit kick-starter-y, with each of the goals needing specific funding. Is that the best model we've found so far?
Sadly, this seems to be the way things go once a technology catches on, commercially. Can't blame large donors for sponsoring only the parts they are interested in. Fortunately, considerable funding of TweedeGolf comes from (Dutch) government, I think.
In open source I guess there's two types of work:
1. features
2. maintenance
You can 'sell' new features. They cost money to create, but they solve real problems. Those problems also cost money and if that's more than the cost of creating the feature, companies are willing to put in money (generally).
Maintenance is harder. But there are now some maintainer funds! Like the one from RustNL: https://rustnl.org/maintainers/
These are broader ongoing work and backed by many orgs chipping in a little bit.
Idk if it's the best model, but at least it seems to kinda work
If you read documentation around Rust Async and Tokio, you'll find proper explanation why CPU intensive parts should not be part of async stack, how to use primitives efficiently (like std::sync::Mutex in async blocks), how to glue sync and async code.
A lot of code doesn't follow there guidelines because they don't care about efficiency and don't need it. But there are numerous projects who care about performance and efficiency, and realize the pitfalls once code runs in production (ScyllaDB is one example).
LLMs don't help as well, generating everything async up to the main, using wrong primitives and not properly designing the system.
> Futures aren't (trivially) inlined
In my programming language I wrote custom pass for inlining async function calls within other async functions. It generally works and allows to remove some boilerplate, but it increases result binary size a lot.
Technically Rust can do the same.
The duplicate-state collapse (hoisting the match out of the await branches like in his process_command example) is the single easiest pattern anyone can apply to existing async code today. No compiler work needed, just a refactor.
At the very least, you'd want to have a custom lint that can surface the places where it's applicable. That's pretty close to compiler work.
async fn bar(input: u32) -> i32 {
let blah = input > 10; // Preamble
let result = foo(blah).await;
result * 2 // Postamble
}
> If only we were allowed to execute the code up to the first await point, then we could get rid of the Unresumed state. But "futures don't do anything unless polled" is guaranteed, so we can't change that.
Is that actually valid reasoning? If we know that foo(blah) doesn’t do “anything” until polled, then why can’t bar call foo without polling it before foo itself is polled? After all, there’s no “anything” that will happen.
Because foo might call process::abort().
I disagree. If the codegen / optimizer is trying to preserve the rule that futures don’t have side effects until polled, then it seems fine to assume that the future being wrapped also follows that rule.
So if I call a foo() that violates the rule, it seems odd to complain that the generated bar() also violates it.
[deleted]
Does this kind of thing make noticeable difference when applied to more complicated async functions?
Examples in the blog seem too simple make any conclusions
Hi, author here. I mention in the blog that I've tried to quickly hack two of the simplest optimizations in the compiler and it resulted in 2%-5% binary size savings in real embedded (async) codebases. And a quick and probably deeply flawed synthetic benchmark on the desktop showed a 3% perf increase.
So yes, it does really matter. Keep in mind that optimizations stack. We're preventing LLVM from doing it's thing. So if we make the futures themselves smaller, LLVM will be able to optimize more. So small changes really compound.
Saw that but couldn't find what code it gives that improvement on. Is it some embedded application written with Embassy?
Yes, but I can't share the codebases since they're our client's and proprietary. But there aren't a lot of big embedded codebases that are also open source
Got it, thanks
Rust's ownership model is perfect for threading yet it went all in on async
Rust in my opinion needs an algebraic effects system to truly fix the function coloring problem. We have OCaml 5 which has one in production as well as a few other languages like Koka experimenting with it but hopefully we can add that to Rust as well. I'm not sure how the keyword generics initiative is going though, haven't heard any news on that.
It seems like inlining futures that aren't holding variables over the await point might solve a large part of these issues?
Async Rust is still better than any language async feature.
sad but true
Async Rust on small embedded chips like ESP32 feels revolutionary. This project looks promising.
This has been on my mind lately too with the talk of the new CPUs. Zen 7 sounds like it'll be a beast & coding against 1 out of dozens of cores would be a pity
Any solution that involves having to use a keyword to get the value returned from a function is such a poor design choice to me. Nearly every time I call a function I don't want to have to care if it is synchronous or not. I want the syntax and grammar (and illusion?) of one continuous thread of execution. The few times where I explicitly want to not wait are the places that should be special. This is why new languages should build in green threading from the start.
> Any solution that involves having to use a keyword to get the value returned from a function is such a poor design choice to me.
Technically speaking Rust didn't have to use a keyword (and in fact didn't for quite some time between 1.0 and when async was added), but the ergonomics of the library-based keyword-less solutions was considered to be less than optimal compared to building in support to the language.
> This is why new languages should build in green threading from the start.
This, just like most other decisions one can make when designing a language, is a tradeoff. Green threads have their niceties for sure, but they also have drawbacks which made them a nonstarter for what the point in the design space the Rust devs were aiming for. In particular, the Rust devs wanted something that did not require overhead for FFI and also did not require foreign code to know that something async-related is involved. Green threads don't work here because they either have overhead when copying stuff between the green thread stack and the foreign stack or need foreign code to understand how to handle the green thread stack.
> Nearly every time I call a function I don't want to have to care if it is synchronous or not.
The problem is that "nearly every time" bit. There's times where you are looking at the code and you absolutely want to be aware of where the function is suspending. Similar to the use of ? in error handling to surface all failable operations that might do an abnormal return.
> I want the syntax and grammar (and illusion?) of one continuous thread of execution
Then you shouldn't be using a low-level systems language? You can simply choose a higher-level abstraction language that better matches your programming preferences.
What you want is quite the opposite of what rust is -- rust force rules.
Look at how the borrow works. Most of the time, the compiler can _suggest_ the fix.. and instead of fixing that silently, they want you to fix it.
This is the design choice they made.
There are much more problems, like async drop.
what's the modern "absolute beginner's guide to async in Rust" - ideally something dense that can bring someone motivated from beginner to expert in ~1 week of intense hacking on it?
there is a chapter on async in comprehensive rust and rustbook which ought to bring you up to speed
there is the async book but it is largely unfinished
you can watch John Gjengset's crust of rust async, decrusting tokio, and why what, and how of pinning in rust
then there are tokio-lessons and tokio tutorial which teach how to use tokio runtime
and there are also good blogposts by phil-oop and rose wright on how async works
add async keyword to functions
add .await when calling them
use tokio in your main function (easy to look up)
use the async recursion crate if you need to use recursion but don't want to box everything
There are some bonuses like calling functions in parallel, but there you go.
And then you want to do something trivial like an async callback
I like it more how Zig is approaching async with the new IO. It avoids function coloring.
Does it, though?
Whether your function has the `async` keyword attached or has a function argument of type `IO` doesn't really change anything substantial.
The whole "function color" argument seems pretty overblown to me. You can't call `foo(int, string)` if you don't have both an int and a string, so is it now a different "color" than the function `bar(int)`? If you want to call `foo` from `bar`, you have to somehow procure a string, and the same is true for `IO` in Zig, and the same is true for async in Rust, where what you have to procure is an async executor.
The `async` keyword can be seen as syntactic sugar for introducing a hidden function parameter (very literally, it's called `&mut std::task::Context`), as well as rewriting the function as a state machine.
Yeah, I tend to agree. What does improve quality-of-life substantially is having a proper effect system, especially when it comes to composing higher-order functions.
Having to write copies of List.map and List.async_map in the stdlib is a smell, but the real cost is potentially having to duplicate every function in your code that calls either.
E.g, if you have the 'async' effect, List.map can work with async functions or synchronous ones, without modification. It's the caller's responsibility to provide that async handler/environment at whatever level of abstraction makes sense, instead of explicitly wiring IO or async all the way through for a function that may or may not need it. The compiler (or runtime, if necessary) will keep you from calling a function that requires the async effect if you don't have a handler for it.
great article
Async rust is a big wart in the language. There was this one guy posting about "i want off Gos wild ride" a few years back, but IIRC he never considered just how bad async rust is.
I will take Go concurrency over rust async any day of the week.
Love Rust. They simply missed the mark with async. Swing and a miss.
The risk they took was very calculated. Unfortunately they’re bad at math and chose the wrong trade-offs.
Ah well. Shit happens.
I think Rust has a pretty solid async implementation, compared to other systems languages. I struggle to point out another systems language with a working and actually used async implementation.
> Unfortunately they’re bad at math and chose the wrong trade-offs
They chose the exact same tradeoffs as C++'s async/await (and the same overall model as Python/NodeJS), so I'm not sure what that says about programming as a whole.
the C++ committee makes consistently god awful terrible decisions
Source: am professional C++ developer
Async in Rust and C++ is nothing like it is in Python or NodeJS. Choose your own runtime is a very different model than having a default one.
Not to mention Tokio (most popular runtime for Rust) is multi-threaded by default. So you have to deal with multithreading bugs as well as normal async ones. That is not the case with most async languages. For example both Python and NodeJS use a single thread to execute async code.
> Async in Rust and C++ is nothing like it is in Python or NodeJS. Choose your own runtime is a very different model than having a default one.
Python still has pluggable eventloops - this is sort of mandatory to interact with weird things like GUI toolkits, and Python's standard event loop was standardised pretty late in the game. Early on there was even an ecosystem split between Twisted and competing event loops implementations.
> For example both Python and NodeJS use a single thread to execute async code
I'd argue this is more a historical artefact of how the languages functioned before futures were introduced, rather than an inherent limitation.
It is an inherent limitation. Multithreading is not free after all. One of the big pros of async programming is the concurrency you get within a single thread. When you make the async runtime multithreaded by default (like Tokio) you don't get this advantage anymore.
You can put tokio in single-threaded mode if you prefer - it's an explicit performance tradeoff. The multithreaded work-stealing executor has higher throughput at the expense of needing more synchronisation.
Or you can schedule your thread-local tasks in a LocalSet to run them all on the owning thread, while keeping the other threads around to handle tasks that are intentionally parallel.
The general theme here is that tokio (and C++ equivalents) provide you the flexibility to do more things than the native Python/Node runtime does (and yes, the defaults take advantage of this). But the underlying intention is the same (and post-GIL we expect to see some movement in this direction on the Python front as well).
[dead]
[dead]
Response to title: so you’re saying it’s viable
It's so funny that people will do anything to hate on Rust, including nitpicking a few bytes of overhead for a future while they reach for an entire thread or runtime to handle async in their favourite language.
I know the people and the company behind this article. They do anything but "hate on Rust".
You could've deduced that from the fact that someone who puts this amount of energy in a detailed article about intricacies of an area of "foo", quite certainly does not "hate on foo".
Not the article, the comments here man.
The article is fine besides the bait title.
It's more that I and people I know love Rust, and enjoy it, and want it to be better. I want it to be relentlessly optimized.
I _love_ Rust and use it whenever I can. I still find the comments in here to be quite appropriate. Async Rust leaves me with a (subjective!) feeling that something isn't quite right. Not that I know how it _should_ be, but that feeling is very different from the non-async parts of the language that almost always leaves me with a warm fuzzy feeling of joy.
I don't know enough about the domain to be objectively helpful, so it's all wishy-washy feelings on my part. I keep reaching for orchestrating things with threads in Rust where most people would probably reach for async these days. The only language where I've felt fine embracing the blessed async system is Haskell and its green threads (which I understand come with their own host of problems).
You realize this article talks about Rust on embedded hardware specifically, where you don’t have threads or big runtimes? There is no hate going on here either, just attempts to make things better. Might I suggest you click through to the homepage and I think you’ll figure out the rest.
Nobody seriously tries to run Golang or Java on an MCU. But they do run Rust code.
J2ME existed before most of the current crop of Rust programmers were born.
Which doesn't work on low-end MCUs even in the CLDC profile (< 16 bit, < 32k ram, < 160k rom) and is dead since 2007, proving my point that nobody wants to run it.
That's a bit rich given the abuse that Rust evangelists dish out to every other language in the world.
Rust is a passing faux, safe C will just overtake it.
Agree with the other commenters that the title is a bit too dramatic. The content was well written and got the point across.
I still don’t have enough experience to have a strong opinion on Rust async, but some things did standout.
On the good side, it’s nice being able to have explicit runtimes. Instead of polluting the whole project to be async, you can do the opposite. Be sync first and use the runtime on IO “edges”. This was a great fit to a project that I’m working on and it seems like a pretty similar strategy to what zig is doing with IO code. This largely solved the function colloring problem in this particular case. Strict separation of IO and CPU bound code was a requirement regardless of the async stuff, so using the explicit IO runtime was natural.
On the bad side, it seems crazy to me how much the whole ecosystem depends on tokio. It’s almost like Java’s GC was optional, but in practice everyone just used the same third party GC runtime and pulling any library forced you to just use that runtime. This sort of central dependency is simply not healthy.
So depending on your context, it may seem like the whole ecosystem depends on tokio, but if you look at say, embedded Rust, it makes a little more sense.
The system requirements for an async runtime on a workstation processor compared to say, an RP2040 look very different. But given the ability to swap out the backend, when I write async IO code for a small ARM M0 microcontroller, that code looks almost identical to what I'd be writing outside that context, but with an embedded focused runtime, ie embassy.
I can focus less on the runtime specifics as they use the same traits and interfaces. Compare this with say, using a small RTOS or rolling your own async environment, it's quite nice.
Much of what I need to learn to write the async code in embassy can cross over to other domains.
What's the alternative? I'm happy to use tokio, but i'm happy other folks can enjoy other executors (smol, async-std, glommio, etc). I think the situation is OK because tokio is well-maintained, even though it's not part of the standard library, and i'm afraid making it part of the standard library would make it harder to use other executors, and harder to port the standard library to other platforms.
But maybe my fears are unfounded.
> What's the alternative?
Traits in the stdlib for common functionality like "spawn" (a task) and things like async timers. Then executors could implement those traits and libraries could be generic over them.
Yep. We could have a system like how there's a global system allocator, but you can override it if you want in your app.
We could have something similar for a global async executor which can be overridden. Or maybe you launch your own executor at startup and register it with std, but after that almost all async spawn calls go through std.
And std should have a decent default executor, so if you don't care to customise it, everything should just work out of the box.
Good point, but the devil lies in the details. How should the timers behave? Is the clock monotonic? Are tasks spawned on the same thread? Different platforms and executors have different opinions. Maybe it's still possible and just a lot of work?
>the devil lies in the details
This is true, but perhaps not uniquely so, when compared to platform dependence of the standard libary already. File semantics, sync primitive gaurantees and implementations, timers and timer resolutions, etc have subtle differences between platforms that the Rust stdlib makes no further gaurantees about.
> Maybe it's still possible and just a lot of work?
Yeah, I think that's the current status. I believe it was for a long time (and possibly still is) blocked on language improvements to async traits (which didn't exist at all until relatively recently and still don't support dyn trait).
It would make sense to have an official default async runtime in the standard library while keeping the door open to use any other runtime, just like we already have for the heap allocator or reference counting garbage collection.
There are issues in particular with core traits for IO or Stream being defined in third-party libraries like tokio, futures or its variants. I've seen many cases where libraries have to reexport such types, but they are pinned to the version they have, so you can end up with multiple versions of basic async types in the same codebase that have the same name and are incompatible.
As of now I don’t think there’s an alternative. I’m not a Rust expert but the core issue to me is that “async” goes beyond just having a Futures scheduler. Async stuff usually needs network, disk, os interaction, future utilities(spawn) and these are all things the runtime (tokio) provides. It’s pretty hard to be compatible with each other unless the language itself provides those.
That's not the core issue at all it's lifetimes and allocations.
Can you elaborate on this please? Do you mean that’s basically impossible for rust std to provide a default runtime that makes “everyone” (embedded on one end and web on the other) happy?
I think that's the problem in essence, yes. Different executors built on top of different primitives and having different executions strategies will have mutually incompatible constraints.
To spawn a future on tokio, it has to implement `Send`, because tokio is a work-stealing executor. That isn't the case for monoio or other non-work-stealing async executors, where tasks are pinned to the thread they are spawned on and so do not require `Send` or `Sync`, so you can use Rc/RefCell.
Moreover, the way that async executors schedule execution can be _different_. I have a small executor I made that is based on the runtime model of the JS event loop. It's single-threaded async, with explicit creation of new worker threads. That isn't a model that can "slot in" to a suite of traits that adequately represents the abstraction provided by tokio, because the abstraction of my executor and the way it schedules tasks are fundamentally different.
Any reasonably-usable abstraction for the concept of an async runtime would impose too many constraints on the implementation in the name of ensuring runtime-generic code can execute on any standard runtime. A Future, for better or worse, is a sufficiently minimal abstraction of async executability without assuming anything about how the polling/waking behavior is implemented.
Here are some alternatives for concurrent operations in rust that don't use Async. Which are available depend on the target, e.g. embedded/low-level vs GPOS. I use all of these across my Rust projects:
Most of you are already aware. I bring this up because I have observed that in the Rust OSS community (especially embedded) people sometimes refer to not using Async as blocking, and are not aware that Async isn't the only wya to manage concurrency. People new to it are learning it this way: "If you're not using Tokio or Embassy (Or some other executor), you are blocking a process."That's kind of wild... I'm relatively novice with Rust still, but I was pretty aware that the different executors weren't the only async option... I thought it was pretty cool you could opt into tokio for the bulk of async request work, but if I wanted to use a pool for specific workers, or something else on a more monolithic service/application, I could still launch my own threads for that use case pretty easily.
The hardest parts for me to grok really came down to lifetime memory management, for example a static/global dictionary as a cache, but being able to evict/recover entries from that dictionary for expired data... This is probably the use case that IMO is one of the least well documented, or at least lacking in discoverable tutorials etc.
As you mentioned Java, it’s interesting to notice that it has had similar problems throughout its history: logging (now it’s settled on slf4j but you still find libraries using something else), commons (first Apache Commons, now Guava), JSON (it has settled on Jackson but things like Gson and Simple-json are not uncommon to see), nullability annotations ( first with unofficial distributions of JSR-305 which never became official, then checker framework , and lately with everything migrating to JSpecify). All this basic stuff needs to be provided by the language to avoid this fragmentation and quasi de facto libraries from appearing.
The traditional approach in Java has been to let those things happen in third party space, then form an expert group to standardise a shared API for them. That was done with XML parsers and ORM fairly successfully. It doesn't always work, as with your examples - there was an attempt with logging, but it was done badly, JSR-305 ran around, etc. But I think it's a much better approach than the JDK maintainers trying to get it right first time.
But this fragmentation is what needed to make good software. If you put things in the standard library you're just adding a +1 to the fragmented landscape because for instance it will never be specialized enough to cover all use cases, so people will still use their own libraries, just like for instance c++ has three dozen distinct implementations of hash maps just because one cannot fit all cases
It could also be argued that putting a specific executor model into the standard library will make the problem worse because it will give library crates license to use it without considering alternatives because it is standard. At least today taking a dependency on a specific runtime is a well-known boondoggle .
Not only that, but there kind of is a defacto standard (tokio), which is pretty much the default if you aren't in a specific, resource constrained use case.
commons, is something that is eventually being migrated into the main, at least those that are decided to be required for most projects. I don't use apache commons or guava at all in java (now at 25 or 26, depending on project) - there are still some libs that depend on those, but I would argue that most use it out of inertia, than actual need.
As for slf4j, I still don't see any justification for an abstraction layer on top of logging. I never, ever migrated from one logger to another, and even if I did need to do it - it is very easy as most loggers are very similar. E.g. that's why I decided to use log4j2 in my latest project.
The logging implementation should be an application level decision. By using a facade like slf4j a library allows an application using any logging implementation to use it. That’s why libraries should use it.
It's very much possible to use rust for a lot of areas with async without needing to be dependent on tokio. I think it's really just the web/server stuff that's entirely tokio dependent. Writing libraries to be executor agnostic is not terribly difficult but does require some diligence which isn't necessarily present in most of the community.
It really depends on the abstraction model of the library. If the library needs to actually read/write a file, it either needs to depend on a runtime or provide some horrific abstraction over the process it will use to do that. This doesn't apply to sync IO libraries which can just use the Standard Library.
Web/server frameworks have to bind to a runtime because they have to make decisions about how to connect to a socket. Hyper is sufficiently abstract that it doesn't require any runtime, but using hyper directly provides no framework-like benefits and requires that you make those decisions and provide a compatible socket-like implementation for sending requests.
That's the thing though, it's possible but it makes the simple hello world example more tedious. It's totally possible to make an abstraction layer, provide a tokio implementation out of the box but leave the door open for other implementations to slot in. Anyone who's written portable code for non posix systems is used to this experience. Standardization is definitely better but it also has its own share problems as it can limit what's possible. I expect that the decision to delay standardizing on these interfaces too early will end up leading to a better long term design. Especially if major improvements to async are on the horizon and can alter the final shape of that standard.
Great article! Love these types of deep dives into optimizations. Hope the project goal works out!
I've felt before that compilers often don't put much effort into optimizing the "trivial" cases.
Overly dramatic title for the content, though. I would have clicked "Async Rust Optimizations the Compiler Still Misses" too you know
So on the title, I picked this because it's simply the truth. Since async landed in 2019 or so, not much has changed.
Yes, we can have async in traits and closures now. But those are updates to the typesystem, not to the async machinery itself. Wakers are a little bit easier to work with, but that's an update to std/core.
As I understand it, the people who landed async Rust were quite burnt out and got less active and no one has picked up the torch. (Though there's 1 PR open from some google folk that will optimize how captured variables are laid out in memory, which is really nice to have) Since I and the people I work with are heavy async users, I think it's maybe up to me to do it or at least start it. Free as in puppy I guess.
So yeah, the title is a little baitey, but I do stand behind it.
Some of the burnout no doubt being due to the catastrophizing of every decision by the community and the extreme rhetoric used across the board.
Great to see people wanting to get involved with the project, though. That’s the beauty of open source: if it aggravates you, you can fix it.
As an example of this, i remember a huge debate at the time about `await foo()` vs `foo().await` syntax. The community was really divided on that one, and there was a lot of drama because that's the kind of design decision you can't really walk back from.
Retrospectively, i think everyone is satisfied with the adopted syntax.
It makes sense that there was a huge debate, because the postfix .await keyword was both novel (no other languages had done it that way before) and arguably the right call. Of course, one can argue that the ? operator set a relevant precedent.
> Retrospectively, i think everyone is satisfied with the adopted syntax.
Maybe it’s a case of agree and commit, since it can’t really be walked back.
Various prominent people have said years after that .await was the correct choice after all
I'm not prominent but I disagreed with it at the time and I was wrong.
I’m curious - why were you wrong? It still seems like a wart to me, all these years later. What am I missing?
Contrast it with async in JS/ES as an example... now combine it with the using statement for disposeAsync instances.
It's not so bad when you have one `await foo` vs `foo.await`, it's when you have several of them on a line in different scopes/contexts.Another one I've seen a lot is...
Though that could also be... In any case, it still gets ugly very quickly.I’ve never in my life used JS, so I’ll have to take your word for it.
It's a language I'm familiar with that uses the `await foo` syntax and often will see more than one in a line, per the examples given. C# is the most prominent language that has similar semantics that I know well, but is usually less of an issue there.
I'll give my two cents here. I work with Dart daily, and it also uses the `await future` syntax. I can cite a number of ergonomic issues:
```dart (await taskA()).doSomething() (await taskB()) + 1 (await taskC()) as int ```
vs.
```rust taskA().await.doSomething() taskB().await + 1 taskC().await as i32 ```
It gets worse if you try to compose:
```dart (await taskA( (await taskB( (await taskC()) as int )) + 1) ).doSomething() ```
This often leads to trading the await syntax for `then`:
```dart await taskC() .then((r) => r as i32) .then(taskB) .then((r) => r + 1) .then(taskA) .then((r) => r.doSomething()) ```
But this is effectively trading the await structured syntax for a callback one. In Rust, we can write it as this:
```rust taskA(taskB(taskC().await as i32).await + 1).await.doSomething() ```
Two spaces before a line make it a code block literal
HN has never used markdown so the triple-tick does nothing but create noise here.I think it's partially accurate, and partially a consequence of how async fractures the design space, so it will always feel like a somewhat separate thing, or at least until we figure out how to make APIs agnostic to async-ness.
I am a beginner to Rust but I've coded with gevent in Python for many years and later moved to Go. Goroutines and gevent greenlets work seamlessly with synchronous code, with no headache. I know there've been tons of blog posts and such saying they're actually far inferior and riskier but I've really never had any issues with them. I am not sure why more languages don't go with a green thread-like approach.
Because they have their own drawbacks. To make them really useful, you need a resizable stack. Something that's a no-go for a runtime-less language like Rust.
You may also need to setup a large stack frame for each C FFI call.
Rust originally came with a green thread library as part of its primary concurrency story but it was removed pre-1.0 because it imposed unacceptable constraints on code that didn’t use it (it’s very much not a zero cost abstraction).
As an Elixir + Erlang developer I agree it’s a great programming model for many applications, it just wasn’t right for the Rust stdlib.
One of Rust's central design goals is to allow zero cost abstractions. Unifying the async model by basically treating all code as being possibly async would make that very challenging, if not impossible. Could be an interesting idea, but not currently tenable.
One problem I have with systems like gevent is that it can make it much harder to look at some code and figure out what execution model it's going to run with. Early Rust actually did have a N:M threading model as part of its runtime, but it was dropped.
I think one thing Rust could do to make async feel less like an MVP is to ship a default executor, much like it has a default allocator.
They could still come in a step short of default executor and establish some standard traits/types that are typical across executors.
By providing a default, I think you're going to paint yourself into a corner. Maybe have one of two opt-in executors in the box... one that is higher resource like tokio and one that is meant for lower resource environments (like embedded).
As an uninterested 3rd party, it’s a wild exaggeration
Agree on title. Too dramatic.
The author seems to be obsessing about the overhead for trivial functions. He's bothered by overhead for states for "panicked" and "returned". That's not a big problem. Most useful async blocks are big enough that the overhead for the error cases disappears.
He may have a point about lack of inlining. But what tends to limit capacity for large numbers of activities is the state space required per activity.
> Most useful async blocks are big enough that the overhead for the error cases disappears.
Is it really though?
In my experience many Rust applications/libraries can be quite heavy on the indirection. One of the points from the article is that contrary to sync Rust, in async Rust each indirection has a runtime cost. Example from the article:
I would naively expect the above to be a 'free' indirection, paying only a compile-time cost for the compiler to inline the code. But after reading the article I understand this is not true, and it has a runtime cost as well.In my experience, it's not uncommon to have an async trait method for which many implementations are actually synchronous. For example, different tables in your DB need to perform some calculations, but only some tables reference other tables. In that case, the method needs to be async and take a handle to the DB as parameter, but many table entries can perform the calculation on their own without using the handle (or any async operation).
This may look like a case of over-optimization, but given how many times i've seen this pattern, i assume it builds up to a lot of unnecessary fluff in huge codebases. To be clear, in that case, the concern is not really about runtime speed (which is super fast), but rather about code bloat for compilation time and binary size.
> Most useful async blocks are big enough that the overhead for the error cases disappears.
Most useful async blocks are deeply nested, so the overhead compounds rapidly. Check the size of futures in a decently large Tokio codebase sometime
He's optimizing for embedded no-std situation. These things do matter in constrained environments.
He briefly mentions microcontrollers, but doesn't go into details. That might make sense, for microcontrollers which implement some kind of protocol talking to something.
> [...] That's not a big problem [...]
Depends somewhat on your expectations, I suppose. Compared to Python, Java, sure, but Rust off course strives to offer "zero-cost" high level concepts.
I think the critique is in the same realm of C++'s std::function. Convenience, sure, but far from zero-cost.
To the point it got replaced by std::function_ref() in C++26.
Exactly. And I guess that is also the gist of the article: async Rust needs additional TLC.
> Agree on title. Too dramatic.
not just too dramatic
given that all the things they list are
non essential optimizations,
and some fall under "micro optimizations I wouldn't be sure rust even wants",
and given how far the current async is away from it's old MVP state,
it's more like outright dishonest then overly dramatic
like the kind of click bait which is saying the author does cares neither about respecting the reader nor cares about honest communication, which for someone wanting to do open source contributions is kinda ... not so clever
through in general I agree rust should have more HIR/MIR optimizations, at least in release mode. E.g. its very common that a async function is not pub and in all places directly awaited (or other wise can be proven to only be called once), in that case neither `Returned` nor `Panicked` is needed, as it can't be called again after either. Similar `Unresumed` is not needed either as you can directly call the code up to the first await (and with such a transform their points about "inlining" and "asyncfns without await still having a state machine" would also "just go away"TM, at least in some places.). Similar the whole `.map_or(a,b)` family of functions is IMHO a anti-pattern, introducing more function with unclear operator ordering and removal of the signaling `unwrap_` and no benefits outside of minimal shortening a `.map(b).unwrap_or(a)` and some micro opt. is ... not productive on a already complicated language. Instead guaranteed optimizations for the kind of patterns a `.map(b).unwrap_or(a)` inline to would be much better.
Async seems like an underbaked idea across the board. Regular code was already async. When you need to wait for an async operation, the thread sleeps until ready and the kernel abstracts it away. But We didn’t like structuring code into logical threads, so we added callback systems for events. Then realized callbacks are very hard to reason about and that sequential control is better.
So threads was the right programming model.
Now language runtimes prefer “green threads” for portability and performance but most languages don’t provide that properly. Instead we have awkward coloring of async/non-async and all these problems around scheduling, priority, and no-preemption. It’s a worse scheduling and process model than 1970.
> Regular code was already async. When you need to wait for an async operation, the thread sleeps until ready and the kernel abstracts it away
Not really. I’ve observed async code often is written in such a way that it doesn’t maximize how much concurrency can be expressed (eg instead of writing “here’s N I/O operations to do them all concurrently” it’s “for operation X, await process(x)”). However, in a threaded world this concurrency problem gets worse because you have no way to optimize towards such concurrency - threads are inherently and inescapably too heavy weight to express concurrency in an efficient way.
This is is not a new lesson - work stealing executors have long been known to offer significantly lower latency with more consistent P99 than traditional threads. This has been known since forever - in the early 00s this is why Apple developed GCD. Threads simply don’t provide any richer information it needs in the scheduler to the kernel about the workload and kernel threads are an insanely heavy mechanism for achieving fine grained concurrency and even worse when this concurrency is I/O or a mixed workload instead of pure compute that’s embarrassingly easily to parallelize.
Do all programs need this level of performance? No, probably not. But it is significantly more trivial to achieve a higher performance bar and in practice achieve a latency and throughput level that traditional approaches can’t match with the same level of effort.
You can tell async is directionally kind of correct in that io_uring is the kernel’s approach to high performance I/O and it looks nothing like traditional threading and syscalls and completion looks a lot closer to async concurrency (although granted exploiting it fully is much harder in an async world because async/await is an insufficient number of colors to express how async tasks interrelate)
> work stealing executors have long been known to offer significantly lower latency with more consistent P99 than traditional threads. This has been known since forever - in the early 00s
Well, we know how to make "traditional threads" fast, with lower latency and more consistent P99 since forever^2, in the early 90s. [1]
Sure, we can't convince that Finnish guy this is worthwhile to include in THE kernel, despite similar ideas had been running in Google datacenters for idk how many years, 15 years+? But nothing stops us from doing it in the userspace, just as you said, a work stealing executor. And no, no coloring.
Stack is all you need. Just make your "coroutines" stackful. Done. All those attempts trying to be "zero-cost" and change programming model dramatically to avoid a stack, introduced much more overhead than a stack and a piece of decent context switch code.
> You can tell async is directionally kind of correct in that io_uring is the kernel’s approach
lol, it is very hard to model anything proactor like io_uring with async Rust due to its defects.
[1] https://dl.acm.org/doi/10.1145/121132.121151
There are always trade-offs and there is never one best way to do something.
Stack-based coroutines is one way to do it. A relevant trade-off here is overhead, requiring a runtime and narrowing the potential use-cases this can serve (i.e embedded real-time stuff).
If you don’t care about supporting such use cases you can of course just create a copy of goroutines and be pretty happy with the result.
I am not saying threads are the model for all programming problems. For example a dependency graph like an excel spreadsheet can be analyzed and parallelized.
But as you observed, async/await fails to express concurrency any better. It’s also a thread, it’s just a worse implementation.
That’s incorrect. Even when expressed suboptimally, it still tends to result in overall higher throughput and consistently lower latency (work stealing executors specifically). And when you’re in this world, you can always do an optimization pass to better express the concurrency. If you’ve not written it async to start with, then you’re boned and have no easy escape hatch to optimize with.
Why can’t you do the same optimization? Are you maxing out you OS system resources on thread overhead?
That’s part of it. Then you add a thread pool to dispatch your tasks into to mitigate the cost of a thread start. Then you run into blocking problems and are like “I wish I had some keyword to express when a function needed to be run on the thread pool”. Then you’ve done a speed run of the past 40 years of research.
The 40 years of research was actually in OS theory so that you could write normal code and async was abstracted away.
A thread pool is not a research project.
OS thread overhead can be pretty substantial. Starting new threads on Windows is especially expensive.
> threads are inherently and inescapably too heavy weight to express concurrency in an efficient way
Your premise is wrong. There are many counterexamples to this.
Can you explain more ? I always heard this.
The most promiment example is probably Go with its goroutines, but there are so many more. You can easily spawn tens of thousands of goroutines, with low overhead and great performance.
Goroutines/"fibers"/"green threads" are usually scheduled by the runtime system across a small pool of actual OS threads.
The word "thread" is confusing things. In computer science a thread represents a flow of execution, which in concrete terms where execution is a series of function calls, is typically a program counter and a stack.
There are many ways to implement and manage threads. In Unix-like and Windows systems a "thread" is the above, plus a bunch of kernel context, plus implicit preemptive context switching. Because Unix and Windows added threads to their architectures relatively late in their development, each thread has to behave sort of like its own process, capable of running all the pre-existing software that was thread-agnostic. Which is why they have implicit scheduling, large userspace stacks, etc.
But nothing about "thread" requires it to be implemented or behave exactly like "OS threads" do in popular operating systems. People wax on about Async Rust and state machines. Well, a thread is already state machine, too. Async Rust has to nest a bunch of state machine contexts along with space for data manipulated in each function--that's called a stack. So Async Rust is one layer of threading built atop another layer of threading. And it did this not because it's better, but primarily because of legacy FFI concerns and interoperability with non-Rust software that depended on the pre-existing ABIs for stack and scheduling management.
Go largely went in the opposite direction, embracing threads as a first-class concept in a way that makes it no less scalable or cheap than Rust Futures, notwithstanding that Go, too, had to deal with legacy OS APIs and semantics, which they abstracted and modeled with their G (goroutine), M (machine), P (processor) architecture.
I thought it was obvious from context: OS threads are too heavyweight for fine grained concurrency
Go uses userspace threads. It’s also interesting that Go and Java are the only mainstream languages to have gone this route. The reason is that it has a huge penalty when calling FFI of code that doesn’t use green threads whereas this cost isn’t there for async/await.
Also that you have to rewrite the entire standard library, because the kernel knows how to suspend kernel threads on syscalls, but not green threads. (Go and Java already had to do this anyway, of course.)
> the thread sleeps until ready and the kernel abstracts it away.
Sure, but once you involve the kernel and OS scheduler things get 3 to 4 orders of magnitude slower than what they should be.
The last time I was working on our coroutine/scheduling code creating and joining a thread that exited instantly was ~200us, and creating one of our green threads, scheduling it and waiting for it was ~400ns.
You don't need to wait 10 years for someone else to design yet another absurdly complex async framework, you can roll your own green threads/stackful coroutines in any systems language with 20 lines of ASM.
1. Why can’t we have better green threads implementations with better scheduling models?
2. Unchecked array operations are a lot faster. Manual memory management is a lot faster. Shared memory is a lot faster.
Usually when you see someone reach for sharp and less expressive tools it’s justified by a hot code path. But here we jump immediately to the perf hack?
3. How many simultaneous async operations does your program have?
Well, if you offload heavy compute into an async task, then usually it depends strictly on how many concurrent inputs you are given. But even something as “simple” as a performance editor benefits from this if done well - that’s why JS text editors have reasonably acceptable performance whereas Java IDEs always struggled (historically anyway since even Java has adopted green threads).
Are you sure Java's UI issues are caused by threading and not just Swing being a glitchy pile of junk?
For example, if you don't explicitly call the java.awt.Toolkit.sync() method after updating the UI state (which according to the docs "is useful for animation"), Swing will in my experience introduce seemingly random delays and UI lag because it just doesn't bother sending the UI updates to the window system.
only netbeans is written in swing . Eclipse and Jetbrains use their own thing and still generally struggled.
No, JetBrains use Swing in IntelliJ IDEA. You can tell from how it (for example) fails to layout dialogs correctly the first time they're displayed, just like every other Swing application. And how windows have no minimum size because Swing doesn't expose that functionality. And the various baffling bugs involving window focus that are inherent to Swing applications.
Eclipse uses SWT instead, which wraps the platform's native widgets.
You think IDEs are written in JS because of the performance benefits of the threading model?
I thought it was because they could copy chromium.
Why do you think they don’t struggle with input latency? Because the non blocking nature built into the browser model is so powerful and you cannot get that with threads.
I disagree with the premise. I cannot imagine a better latency experience than blocking loop IDEs like VS6.
Which inputs are getting latency? The keyboard? The files?
> the non blocking nature
https://youtu.be/bzkRVzciAZg?si=BuBXxHTgN0OqsAhI
Hate to break it to you but windows gui programming, emblemified by VS6, is about as far away from a blocking threaded model as you get. You literally have a UI event loop and any compute intensive work is meant to be offloaded to other threads via messages/COM. This is why when they failed to do that correctly the entire UI would lock up - because they didn’t have good hygiene around how to offload compute intensive operations that also interacted with the GUI.
You’ve literally argued against yourself without realizing.
Wait which programming model are you arguing is the low latency one? I thought you said it was JS because non-blocking.
Are you sure that latency-sensitive parts are written in async JS instead of having a separate UI thread (pool)? I have no idea myself, but without knowing the details it's hard to argue. Note, that browsers themselves, are usually written in languages like C++ or Rust. They run JS, but aren't written in it
Yes they are, the UI layer is mostly JS, outside the rendering and layout engines.
If you implement threads and code that reacts to an input queue (e.g. PostMessage, queue_push, mq_send, ...), you've implemented (probably a bad version of) async threads. And yes, that's exactly what Windows 1.0 did and what made it great.
But God help you if you have to change the code. Async threads are a way to organize it and make it workable for humans.
Maybe you remember performance of IDEs from 15 years ago because that definitely isn't my experience.
> that’s why JS text editors have reasonably acceptable performance
Absolutely not
You involve the kernel also when you are doing async io.
In this context the interesting thing to measure would be doing IO in your green threads vs OS threads.
A stronger theoretical performance argument for async io is that you can do batching, ala io_uring, and do fewer protection domain crossings per IO that way.
> So threads was the right programming model.
It depends on what you are doing. Threads are the right model for compute-bound workloads. Async is the right model for bandwidth-bound workloads.
Optimization of bandwidth-bound code is an exercise in schedule design. In a classic multithreading model you have limited control over scheduling. In an async model you can have almost perfect control over scheduling. A well-optimized async schedule is much faster than the equivalent multithreaded architecture for the same bandwidth-bound workload. It isn't even close.
Most high-performance code today is bandwidth-bound. Async exists to make optimization of these workloads easier.
If this is a classic exercise can you show me the material?
Why can’t a scheduler be written which optimizes around IO? What additional information is present in code that has async/await annotations?
Threads are a scheduling model that delegates to the OS scheduler. Async style provides a primitive for creating a custom scheduler but is not a scheduler per se.
To use a custom scheduler you must first disable the existing schedulers your code is using by default for both execution and I/O. That means no OS scheduling. Thread-per-core architectures with static allocation and direct userspace I/O is the idiomatic way to do this regardless of programming language.
Optimal scheduling is a profoundly intractable problem -- it is AI-Complete. A generic scheduler is always going to be deeply suboptimal because a remotely decent schedule isn't practically computable in real systems. A more optimal scheduler must continuously rewrite the selection and ordering of thousands of concurrent operations in real-time. Importantly, this dynamic schedule rewriting is based on a model that can see across all operations globally and accurately predict both future operations that haven't happened yet and any ordering dependencies between current and future operations. A modern system can handle tens of millions of these operations per second, so the scheduling needs to be efficient.
A generic scheduler has to allow for almost arbitrary operation graphs and behavior. However, if you are writing e.g. a database engine, you have almost the entire context of how operations relate to each other both concurrently and across time. The design of a somewhat optimal scheduler that only understands your code becomes computationally feasible. It isn't trivial -- scheduler design is properly difficult -- but you build it using async style.
That’s not what I asked.
I believe that's actually how the virtual threads in the newer Java works. It's smart enough to notice IO and properly park it and move to another thread.
I think it's still basically doing epoll behind the scenes [1], but you have straightforward sequential code in the process and the actual implementation is invisible to the user, and you can use old boring blocking code with an object that is a drop-in replacement for Thread.
I personally still kind of prefer the explicit async stuff with Futures and Vert.x since I kind of like the idea that async is encoded into the type itself so you're more directly aware of it, but I'm definitely an outlier for that.
[1] Genuinely, please correct me if I'm wrong, it's very possible that I am.
I think that callbacks are actually easier to reason about:
When it comes time to test your concurrent processing, to ensure you handle race conditions properly, that is much easier with callbacks because you can control their scheduling. Since each callback represents a discrete unit, you see which events can be reordered. This enables you to more easily consider all the different orderings.
Instead with threads it is easy to just ignore the orderings and not think about this complexity happening in a different thread and when it can influence the current thread. It isn't simpler, it is simplistic. Moreover, you cannot really change the scheduling and test the concurrent scenarios without introducing artificial barriers to stall the threads or stubbing the I/O so you can pass in a mock that you will then instrument with a callback to control the ordering...
The problem with callbacks is that the call stack when captured isn't the logical callstack unless you are in one of the few libraries/runtimes that put in the work to make the call stacks make sense. Otherwise you need good error definitions.
You can of course mix the paradigms and have the worst of both worlds.
I agree. I don’t think callbacks are an underbaked language feature.
The problem comes from trying to sit on both chairs: we want async but want to be able to opt out. This is what causes most of the ugliness, including function colouring. Just look at golang, where everything is async with no way to change it, it's great. It's, probably, not well-suited for things like microcontrollers, where every byte matters, but if you can afford the overhead, it's so much better than Rust async. Before async Rust was an interesting and reasonable language, now it's just a hot mess that makes your eyes bleed for no reason.
> It's, probably, not well-suited for things like microcontrollers, where every byte matters, but if you can afford the overhead, it's so much better than Rust async.
There is one hill I'll die on, as far as programming languages go, which is that more people should study Céu's structured synchronous concurrency model. It specifically was designed to run on microcontrollers: it compiles down to a finite state machine with very little memory overhead (a few bytes per event).
It has some limitations in terms of how its "scheduler" scales when there are many trails activated by the same event, but breaking things up into multiple asynchronous modules would likely alleviate that problem.
I'm certain a language that would suppprt the "Globally Asynchronous, Locally Synchronous" (GALS) paradigm could have their cake and eat it too. Meaning something that combines support for a green threading model of choice for async events, with structured local reactivity a la Céu.
F'Santanna, the creator of Céu, actually has been chipping away at a new programming language called Atmos that does support the GALS paradigm. However, it's a research language that compiles to Lua 5.4. So it won't really compete with the low-level programming languages there.
[0] https://ceu-lang.org/
[1] https://github.com/atmos-lang/atmos
Everything is not async in Go.
If your threads are "free" you can just run 400 copies of a synchronous code and blocking in one just frees the thread to work on other. async within same goroutine is still very much opt in (you have to manually create goroutine that writes to channel that you then receive on), it just isn't needed where "spawn a thread for each connecton" costs you barely few kb per connection.
What GP meant - what everyone means when they say this - is that goroutines are always M:N threading and so there is no such thing as function coloring. In Rust to get M:N threading you have to use async and in practice every library you use has to use async. Hence function coloring, and two separate ecosystems of libraries in the same language.
> not well-suited for things like microcontrollers, where every byte matters
except when a RAM fetch is so expensive a load is basically an async call - and it's a single machine code instruction at the same time
Threads are neither better or worse than async+callbacks. They are different. There are problems which map nicely to threads and there are problems which are much nicer to express with async.
Such as? The entire premise of async is that callbacks were a mistake because they broke sequential reasoning and control.
Every explanation of the feature starts with managing callback hell.
Beware, they are different concepts.
Threads offer concurrent execution, async (futures) offer concurrent waiting. Loosely speaking, threads make sense for CPU bound problems, while async makes sense for IO bound problems.
Why? You write the same code with async await but with a keyword at the beginning of every function.
Because if you go down the callstack eventually you won't get the await keyword anymore; you'll get the actual 'waiters' and 'wakers' which define your scheduling
Yeah. The OS handles scheduling and preemption so it’s done for you rather than a call in the stack.
Async/await implementations usually also come with a runtime to handle the work scheduling as well as manage thread context. You can say that you can do that with just threads and callbacks but that's also essentially implementing async/await.
The entire premise of callbacks is that threads were a mistake because they broke sequential reasoning and control.
JK, obviously callbacks became prominent as a result of folks looking for creative solutions to the C10K[0] problem, but threads have a long history of haters[1][2][3].
[0] https://en.wikipedia.org/wiki/C10k_problem
[1] https://brendaneich.com/2007/02/threads-suck/
[2] https://web.stanford.edu/~ouster/cgi-bin/papers/threads.pdf
[3] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...
The callbacks should be just hidden from programmer, that's what async/await are for.
> So threads was the right programming model.
For problems that aren't overly concerned with performance/memory, yes. You should probably reach for threads as a default, unless you know a priori that your problem is not in this common bucket.
Unfortunately there is quite a lot of bookkeeping overhead in the kernel for threads, and context switches are fairly expensive, so in a number of high performance scenarios we may not be able to afford kernel threading
In that sentence I’m referring to the abstract idea of a thread of execution as a model of programming, not OS threads. A green thread implementation could do it too.
But what you said about kernel implementation is true. But are we really saying that the primary motivation for async/await is performance? How many programmers would give that answer? How many programs are actually hitting that bottleneck?
Doesn’t that buck the trend of every other language development in the past 20 years, emphasizing correctness and expressively over raw performance?
> But are we really saying that the primary motivation for async/await is performance?
Of course - what else would it be? The whole async trend started because moving away from each http request spawning (or being bound to) an OS thread gave quite extreme improvements in requests/second metrics, didn't it?
I agree. Managing many http requests or responses was a motivating problem.
What I question is whether 1. Most programs resemble that, so that they make it an invasive feature of every general purpose language. 2. Whether programmers are making a conscious choice because they ruled out the perf overhead of the simpler model we have by default.
That is why we have the function colouring problem and a split ecosystem in the first place - if it were obviously better in all cases, we'd make async the default, and get rid of the split altogether (and there are languages, like Erlang, that fall on this side of the fence)
It was not for performance reasons, but for scaling up.
That's the same thing?
> But are we really saying that the primary motivation for async/await is performance?
The original motivation for not using OS threads was indeed performance. Async/await is mostly syntax sugar to fix some of the ergonomic problems of writing continuation-based code (Rust more or less skipped the intermediate "callback hell" with futures that Javascript/Python et al suffered through).
In some languages, yes, in others (js/python) async is just workaround about not having proper threading.
Python used multiple threads to handle I/O long before async/await was a glimmer in anyone's mind (despite the GIL). nodejs is one of the very few languages I can think of that was born single-threaded and used an asynchronous runtime from the get-go
Importantly though, performance might be worse depending on use case and program. Specifically with scheduling in user space it can negatively impact branch prediction as your CPU is already hyper optimized for doing things differently.
It's all nuanced and what to choose requires careful evaluation.
As I understand, "green threads" are also expensive, for example you either need to allocate a large stack for each "thread", or hook stack allocation to grow the stack dynamically (like Go does), and if you grow the stack, you might have to move it and cannot have pointers to stack objects.
Green threads are fine for large servers with memory overcommit. Even with static stack sizes, you get benefits over OS threads due to the simpler scheduling. But the post was about embedded and green threads really suck there. Only using as much stack as you need for the task is the perfect solution for embedded systems.
>and if you grow the stack, you might have to move it
Most stacks are tiny and have bounded growth. Really large stacks usually happen with deep recursion, but it's not a very common pattern in non-functional languages (and functional languages have tail call optimization). OS threads allocate megabytes upfront to accommodate the worst case, which is not that common. And a tiny stack is very fast to copy. The larger the stack becomes, the less likely it is to grow further.
>cannot have pointers to stack objects
In Go, pointers that escape from a function force heap allocation, because it's unsafe to refer to the contents of a destroyed stack frame later on in principle. And if we only have pointers that never escape, it's relatively trivial to relocate such pointers during stack copying: just detect that a pointer is within the address range of the stack being relocated and recalculate it based on the new stack's base address.
works fine in Go.
Yes, you're not getting Rust performance (tho good part of it is their own compiler vs using all LLVM goodness) but performance is good enough and benefits for developers are great, having goroutines be so cheap means you don't even need to do anything explicitly async to get what you want
Rust chose a different design space for their async implementation though, so what works well for Go wouldn't work well for Rust. In particular, the Rust devs wanted zero-cost FFI that external code doesn't need to know about, which precludes Go-like green threads.
That immediately falls apart if you want to attempt the extremely common pattern of wait free usage of a main/UI thread.
Awaiting allows you to efficiently yield the thread to other tasks instead of blocking it. That's one of its biggest advantages.
When you block the OS does the same thing - yields to other threads.
Yes, and it is extremely expensive. This is a well-known design problem in database engines.
The computational cost of context-switching threads at yield points is often many times higher than the actual workload executed between yield points. To address this you either need fewer yield points, which reduces concurrency, or you need to greatly reduce the cost of yielding. An async architecture reduces the cost of yielding by multiple orders of magnitude relative to threads.
And how much slower is that? What happens when I run a thousand async tasks? I'll give you a hint, with async/await, it has barely any overhead.
Proper modern languages offer both, you can keep your threads and reach out to async only when it makes sense to do.
Now the languages that don't offer choice is another matter.
I’m just waiting for them to try co-operative multithreading again.
That's what async/await is, no? Yielding by awaiting is co-operative.
You don’t have threads on embedded, but you want a way to express concurrent waiting. Different problems altogether
You can, though. We used pthreads (well, pthresd compatible API) in production at massive scale on the ESP32-S3.
I think you are correct, in so far that often N:M threading is overkill for the problem at hand. However, some IO bound problems truly do require it. I haven't kept up with the details, but AFAIK the fallout from Spectre and Meltdown also means context switches are more expensive than they were historically, which is another downside with regular threads.
I also want to address something that I've seen in several sub-threads here: Rust's specific async implementation. The key limitation, compared to the likes of Go and JS, is that Rust attempts to implement async as a zero-cost abstraction, which is a much harder problem than what Go and JS does. Saying some variant of "Rust should just do the same thing as Go", is missing the point.
I think rust didn’t need async at all.
The question then becomes what, if anything, should take its place, and what are the corresponding tradeoffs?
What is kernel in this context?
> Now language runtimes prefer “green threads” for portability and performance
"Green threads" only exist in crappy interpreted languages, and only because they have stop-the-world single-threaded garbage collection.
Go and Java both have green threads, and are not interpreted nor limited to single threaded GC.
I don’t understand why Rust even has panics if its primary goal is safety. We should be able to prove that the code has no paths that may panic ever. I’ve been looking at this all week. It’s very difficult to make a program that is guaranteed not to panic. My understanding is that the panic handler is about 300kb, and the only way to exclude it is if your code has no paths that can panic when it compiles. And after it compiles you can check the binary to see if the panic handler was included. It’s hacky.
Yes you can lint out unwraps and other panic operations, but if there was a subset of no-panic rust a large part of the issue detailed in this post goes away. But it’s frustrating working with a language that has so many operations that can, in theory, panic even if in practice they should only do so if a bit flips. Like a proving an array is non-empty or working with async. You either end up with a lot of error handling for situations which will never happen or really strange patterns like non-empty list pattern (structure with first field and then your list). Which of course ends up adding its own bloat.
> I don’t understand why Rust even has panics if its primary goal is safety. We should be able to prove that the code has no paths that may panic ever. I’ve been looking at this all week. It’s very difficult to make a program that is guaranteed not to panic.
The Rust-in-Linux folks are working on this with things like failable memory operations. It's required for their own use. Increased use of proof (such as proving that an array is non-empty) is also being slowly worked on.
> I don’t understand why Rust even has panics if its primary goal is safety.
Rust's goal is memory safety. Panics are perfectly memory safe.
Panicking is fairly important for ergonomics and safety. If panicking wasn’t available and execution had to proceed in all situations, recovering from a situation like memory corruption where invariants have been violated would require a lot of error handling anywhere an invariant is checked. This is exactly the sort of large amounts of error handling for situations that will almost certainly never arise than you are concerned about.
The OS running the program isn't even perfect.
I tire so much of complainers who want someone else to make all their tools infallible yet want to do nothing. Let's just full-stop there. They not only want to avoid working on the tools. They prefer if the tool does everything for them, and they prefer having things done for them without bound.
Complainers want easy APIs. When the API isn't easy enough, they want easy Kubernetes containers "programmed" by YAML. When that isn't easy enough, it's all point-and-click hosted services on GCP and Amazon. You people don't want to program. You want apps. Infallable apps. You want to be consumers, fed from the sky like little birds who endeavor only never to fledge, never to fly. And you want to pay nothing for it.
The secret you people need to figure out is that the lifestyle you think is sustainable is actually a commensal relationship with people building things for you. There is no vast alliance to wrest power from corporations, to dissolve capitalism, no grass roots movement to "shake things up." There is food falling from higher in the water column from an ecosystem filled with people who do things. Those above do not have time to look down, but if they did, all they would feel is overwhelming contempt, so they only look across at the horizon.
But why do people seek to confirm comments like this? Because Rust scary. Churn on, little ant mill. Let be free any who understand the pointlessness of this performance.
I recently started working with Rust async. The main issue I am currently facing is code duplication: I have to duplicate every function that I want to support both asynchronous and blocking APIs. This could be great to have a `maybe-async`. I took a look at the available crates to work around this (maybe-async, bisync), but they all have issues or hard limitations.
There is work happening on keyword generics[0], which would let a function be generic over keywords like `async` and `const`.
For now the best option to write code that wants to live in both worlds is sans-io. Thomas Eizinger at Fireguard has written a good article about this[1] pattern. Not only does it nicely solve the sync/async issue, but it also makes testing easier and opens the door to techniques like DST[2]
I have my own writing on the topic[3], which highlights that the problem is wider than just async vs sync due to different executors.
0: https://github.com/rust-lang/effects-initiative
1: https://www.firezone.dev/blog/sans-io
2: https://notes.eatonphil.com/2024-08-20-deterministic-simulat...
3: https://hugotunius.se/2024/03/08/on-async-rust.html
Keyword generics are probably not happening because it's kinda a hack.
Algebraic effects are the way forward, but that's a long way off.
Yes I hope in the future we can get to what OCaml 5 has with their algebraic effects system, and hopefully fix any flaws we see in there, so that async will just be syntactic sugar over the underlying effects system.
Considering the latest commits and issues in effects-initiative are about 2 years old, the keyword generics initiative seems effectively dead.
Rust uses Zulip for lang-related discussions. The 't-lang/effects' channel is still somewhat active.
I may have missed something, but how does “sans-io” deal with CPU heavy code? For example, if there’s some heavy decoding/encoding required on the data? Does the event loop only drive the network side and the heavy part is done after the loop is finished?
This is a great question and there isn't a definitive answer provided in the sources I linked.
Broadly I think there are three approaches:
1. For frequent and small CPU heavy tasks, just run them on the IO threads. As long as you don't leave too long between `.await` points (~10ms) it seems to work okay.
2. Run your sans-io code on a dedicated CPU thread and do IO from an async runtime. This introduces overhead that needs to be weighed against the amount of CPU work.
3. Have the sans-io code output something like `Output::DoHeavyCompute { .. }` and later feed the result back as `Input::HeavyComputeResult { .. }`, in the middle run the work on a thread pool.
You won't get any benefits using async with CPU heavy code. Quite the opposite really.
It'll depend immensely on what you're actually doing, but if it's simple enough you may be able to make a macro that subs out the types & awaits
One of the issue I face is a blocking function that takes a generic constrained by a `trait` and its async version takes a generic constrained by an `async trait`.
In my perspective, an "async" function is already an "maybe-async". The distinction between a a `fn -> void` and `fn -> Future<void>` is that the former executes till its end immediately, whereas the other may only finish at another time. If you want to run an async fn in a blocking manner, you would use a blocking executor.
The classic function coloring problem. https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...
This is the type of ugly but necessary discussions that have been happening in c++ for a while.
I never really liked the viral nature of async in rust when it was introduced.
I wish rust the best of luck and with more people like this rust could have a brighter future.
I like this article already because it took me to the goals of Rust for 2026. We use the language in our team, but we haven't needed to go very deep to do the stuff we need. Yet, I really enjoy witnessing the development of a language from ground up with so much community feedback.
I somehow miss noticing that in C++ and I have no idea how it is working in other domains.
My only gripe is that a lot of it is feeling a bit kick-starter-y, with each of the goals needing specific funding. Is that the best model we've found so far?
> My only gripe is that a lot of it is feeling a bit kick-starter-y
IMO the term "project goals" is quite misleading for what this actually is. A project goal is a system for one person (or a small group of people) to express that they'd like to work on something and ask for Rust project volunteers to commit ongoing time and effort to supporting them through code review, answering questions, etc. It doesn't mean that the Rust project itself has set the goal, or even necessarily endorsed it.
So it's not quite right to treat it as a formal roadmap for Rust, just a "there are some contributors interested in working on these areas".
> I somehow miss noticing that in C++ and I have no idea how it is working in other domains.
There seems to be some consensus even within the C++ ISO committee that the evolution process of that language is somewhat broken, mostly due to its size and the way it is organized.
> My only gripe is that a lot of it is feeling a bit kick-starter-y, with each of the goals needing specific funding. Is that the best model we've found so far?
Sadly, this seems to be the way things go once a technology catches on, commercially. Can't blame large donors for sponsoring only the parts they are interested in. Fortunately, considerable funding of TweedeGolf comes from (Dutch) government, I think.
In open source I guess there's two types of work: 1. features 2. maintenance
You can 'sell' new features. They cost money to create, but they solve real problems. Those problems also cost money and if that's more than the cost of creating the feature, companies are willing to put in money (generally).
Maintenance is harder. But there are now some maintainer funds! Like the one from RustNL: https://rustnl.org/maintainers/ These are broader ongoing work and backed by many orgs chipping in a little bit.
Idk if it's the best model, but at least it seems to kinda work
If you read documentation around Rust Async and Tokio, you'll find proper explanation why CPU intensive parts should not be part of async stack, how to use primitives efficiently (like std::sync::Mutex in async blocks), how to glue sync and async code.
A lot of code doesn't follow there guidelines because they don't care about efficiency and don't need it. But there are numerous projects who care about performance and efficiency, and realize the pitfalls once code runs in production (ScyllaDB is one example).
LLMs don't help as well, generating everything async up to the main, using wrong primitives and not properly designing the system.
> Futures aren't (trivially) inlined
In my programming language I wrote custom pass for inlining async function calls within other async functions. It generally works and allows to remove some boilerplate, but it increases result binary size a lot.
Technically Rust can do the same.
The duplicate-state collapse (hoisting the match out of the await branches like in his process_command example) is the single easiest pattern anyone can apply to existing async code today. No compiler work needed, just a refactor.
At the very least, you'd want to have a custom lint that can surface the places where it's applicable. That's pretty close to compiler work.
Is that actually valid reasoning? If we know that foo(blah) doesn’t do “anything” until polled, then why can’t bar call foo without polling it before foo itself is polled? After all, there’s no “anything” that will happen.
Because foo might call process::abort().
I disagree. If the codegen / optimizer is trying to preserve the rule that futures don’t have side effects until polled, then it seems fine to assume that the future being wrapped also follows that rule.
So if I call a foo() that violates the rule, it seems odd to complain that the generated bar() also violates it.
Does this kind of thing make noticeable difference when applied to more complicated async functions?
Examples in the blog seem too simple make any conclusions
Hi, author here. I mention in the blog that I've tried to quickly hack two of the simplest optimizations in the compiler and it resulted in 2%-5% binary size savings in real embedded (async) codebases. And a quick and probably deeply flawed synthetic benchmark on the desktop showed a 3% perf increase.
So yes, it does really matter. Keep in mind that optimizations stack. We're preventing LLVM from doing it's thing. So if we make the futures themselves smaller, LLVM will be able to optimize more. So small changes really compound.
Saw that but couldn't find what code it gives that improvement on. Is it some embedded application written with Embassy?
Yes, but I can't share the codebases since they're our client's and proprietary. But there aren't a lot of big embedded codebases that are also open source
Got it, thanks
Rust's ownership model is perfect for threading yet it went all in on async
Rust in my opinion needs an algebraic effects system to truly fix the function coloring problem. We have OCaml 5 which has one in production as well as a few other languages like Koka experimenting with it but hopefully we can add that to Rust as well. I'm not sure how the keyword generics initiative is going though, haven't heard any news on that.
It seems like inlining futures that aren't holding variables over the await point might solve a large part of these issues?
Async Rust is still better than any language async feature.
sad but true
Async Rust on small embedded chips like ESP32 feels revolutionary. This project looks promising.
This has been on my mind lately too with the talk of the new CPUs. Zen 7 sounds like it'll be a beast & coding against 1 out of dozens of cores would be a pity
Any solution that involves having to use a keyword to get the value returned from a function is such a poor design choice to me. Nearly every time I call a function I don't want to have to care if it is synchronous or not. I want the syntax and grammar (and illusion?) of one continuous thread of execution. The few times where I explicitly want to not wait are the places that should be special. This is why new languages should build in green threading from the start.
> Any solution that involves having to use a keyword to get the value returned from a function is such a poor design choice to me.
Technically speaking Rust didn't have to use a keyword (and in fact didn't for quite some time between 1.0 and when async was added), but the ergonomics of the library-based keyword-less solutions was considered to be less than optimal compared to building in support to the language.
> This is why new languages should build in green threading from the start.
This, just like most other decisions one can make when designing a language, is a tradeoff. Green threads have their niceties for sure, but they also have drawbacks which made them a nonstarter for what the point in the design space the Rust devs were aiming for. In particular, the Rust devs wanted something that did not require overhead for FFI and also did not require foreign code to know that something async-related is involved. Green threads don't work here because they either have overhead when copying stuff between the green thread stack and the foreign stack or need foreign code to understand how to handle the green thread stack.
> Nearly every time I call a function I don't want to have to care if it is synchronous or not.
The problem is that "nearly every time" bit. There's times where you are looking at the code and you absolutely want to be aware of where the function is suspending. Similar to the use of ? in error handling to surface all failable operations that might do an abnormal return.
> I want the syntax and grammar (and illusion?) of one continuous thread of execution
Then you shouldn't be using a low-level systems language? You can simply choose a higher-level abstraction language that better matches your programming preferences.
What you want is quite the opposite of what rust is -- rust force rules.
Look at how the borrow works. Most of the time, the compiler can _suggest_ the fix.. and instead of fixing that silently, they want you to fix it.
This is the design choice they made.
There are much more problems, like async drop.
what's the modern "absolute beginner's guide to async in Rust" - ideally something dense that can bring someone motivated from beginner to expert in ~1 week of intense hacking on it?
there is a chapter on async in comprehensive rust and rustbook which ought to bring you up to speed
there is the async book but it is largely unfinished
you can watch John Gjengset's crust of rust async, decrusting tokio, and why what, and how of pinning in rust
then there are tokio-lessons and tokio tutorial which teach how to use tokio runtime
and there are also good blogposts by phil-oop and rose wright on how async works
https://doc.rust-lang.org/book/ch17-00-async-await.html https://google.github.io/comprehensive-rust/concurrency/welc...
https://rust-lang.github.io/async-book/intro.html
https://youtu.be/ThjvMReOXYM https://youtu.be/o2ob8zkeq2s https://youtu.be/DkMwYxfSYNQ
https://github.com/freddiehaddad/tokio-lessons https://tokio.rs/tokio/tutorial
https://os.phil-opp.com/async-await/ https://dev.to/rosewrightdev/from-futures-to-runtimes-how-as...
It doesn't take a week to learn the async basics.
add async keyword to functions add .await when calling them use tokio in your main function (easy to look up) use the async recursion crate if you need to use recursion but don't want to box everything
There are some bonuses like calling functions in parallel, but there you go.
And then you want to do something trivial like an async callback
I like it more how Zig is approaching async with the new IO. It avoids function coloring.
Does it, though?
Whether your function has the `async` keyword attached or has a function argument of type `IO` doesn't really change anything substantial.
The whole "function color" argument seems pretty overblown to me. You can't call `foo(int, string)` if you don't have both an int and a string, so is it now a different "color" than the function `bar(int)`? If you want to call `foo` from `bar`, you have to somehow procure a string, and the same is true for `IO` in Zig, and the same is true for async in Rust, where what you have to procure is an async executor.
The `async` keyword can be seen as syntactic sugar for introducing a hidden function parameter (very literally, it's called `&mut std::task::Context`), as well as rewriting the function as a state machine.
Yeah, I tend to agree. What does improve quality-of-life substantially is having a proper effect system, especially when it comes to composing higher-order functions.
Having to write copies of List.map and List.async_map in the stdlib is a smell, but the real cost is potentially having to duplicate every function in your code that calls either.
E.g, if you have the 'async' effect, List.map can work with async functions or synchronous ones, without modification. It's the caller's responsibility to provide that async handler/environment at whatever level of abstraction makes sense, instead of explicitly wiring IO or async all the way through for a function that may or may not need it. The compiler (or runtime, if necessary) will keep you from calling a function that requires the async effect if you don't have a handler for it.
great article
Async rust is a big wart in the language. There was this one guy posting about "i want off Gos wild ride" a few years back, but IIRC he never considered just how bad async rust is.
I will take Go concurrency over rust async any day of the week.
Love Rust. They simply missed the mark with async. Swing and a miss.
The risk they took was very calculated. Unfortunately they’re bad at math and chose the wrong trade-offs.
Ah well. Shit happens.
I think Rust has a pretty solid async implementation, compared to other systems languages. I struggle to point out another systems language with a working and actually used async implementation.
> Unfortunately they’re bad at math and chose the wrong trade-offs
They chose the exact same tradeoffs as C++'s async/await (and the same overall model as Python/NodeJS), so I'm not sure what that says about programming as a whole.
the C++ committee makes consistently god awful terrible decisions
Source: am professional C++ developer
Async in Rust and C++ is nothing like it is in Python or NodeJS. Choose your own runtime is a very different model than having a default one.
Not to mention Tokio (most popular runtime for Rust) is multi-threaded by default. So you have to deal with multithreading bugs as well as normal async ones. That is not the case with most async languages. For example both Python and NodeJS use a single thread to execute async code.
> Async in Rust and C++ is nothing like it is in Python or NodeJS. Choose your own runtime is a very different model than having a default one.
Python still has pluggable eventloops - this is sort of mandatory to interact with weird things like GUI toolkits, and Python's standard event loop was standardised pretty late in the game. Early on there was even an ecosystem split between Twisted and competing event loops implementations.
> For example both Python and NodeJS use a single thread to execute async code
I'd argue this is more a historical artefact of how the languages functioned before futures were introduced, rather than an inherent limitation.
It is an inherent limitation. Multithreading is not free after all. One of the big pros of async programming is the concurrency you get within a single thread. When you make the async runtime multithreaded by default (like Tokio) you don't get this advantage anymore.
You can put tokio in single-threaded mode if you prefer - it's an explicit performance tradeoff. The multithreaded work-stealing executor has higher throughput at the expense of needing more synchronisation.
Or you can schedule your thread-local tasks in a LocalSet to run them all on the owning thread, while keeping the other threads around to handle tasks that are intentionally parallel.
The general theme here is that tokio (and C++ equivalents) provide you the flexibility to do more things than the native Python/Node runtime does (and yes, the defaults take advantage of this). But the underlying intention is the same (and post-GIL we expect to see some movement in this direction on the Python front as well).
[dead]
[dead]
Response to title: so you’re saying it’s viable
It's so funny that people will do anything to hate on Rust, including nitpicking a few bytes of overhead for a future while they reach for an entire thread or runtime to handle async in their favourite language.
I know the people and the company behind this article. They do anything but "hate on Rust".
You could've deduced that from the fact that someone who puts this amount of energy in a detailed article about intricacies of an area of "foo", quite certainly does not "hate on foo".
Not the article, the comments here man.
The article is fine besides the bait title.
It's more that I and people I know love Rust, and enjoy it, and want it to be better. I want it to be relentlessly optimized.
I _love_ Rust and use it whenever I can. I still find the comments in here to be quite appropriate. Async Rust leaves me with a (subjective!) feeling that something isn't quite right. Not that I know how it _should_ be, but that feeling is very different from the non-async parts of the language that almost always leaves me with a warm fuzzy feeling of joy.
I don't know enough about the domain to be objectively helpful, so it's all wishy-washy feelings on my part. I keep reaching for orchestrating things with threads in Rust where most people would probably reach for async these days. The only language where I've felt fine embracing the blessed async system is Haskell and its green threads (which I understand come with their own host of problems).
You realize this article talks about Rust on embedded hardware specifically, where you don’t have threads or big runtimes? There is no hate going on here either, just attempts to make things better. Might I suggest you click through to the homepage and I think you’ll figure out the rest.
Nobody seriously tries to run Golang or Java on an MCU. But they do run Rust code.
J2ME existed before most of the current crop of Rust programmers were born.
Which doesn't work on low-end MCUs even in the CLDC profile (< 16 bit, < 32k ram, < 160k rom) and is dead since 2007, proving my point that nobody wants to run it.
That's a bit rich given the abuse that Rust evangelists dish out to every other language in the world.
Rust is a passing faux, safe C will just overtake it.