Async Stack Traces in Rust
One neat result of Rust’s futures and async
/await
design is that all of the
async callers are on the stack below the async callees. In most other languages,
only the youngest async callee is on the stack, and none of the async
callers. Because the youngest frame is most often not where a bug’s root cause
lies, this extra context makes debugging async code easier in Rust.
How others do it
First, before we look at the extra context Rust gives you in its async stack traces, let’s take a look at a stack trace from async code in another language. We’ll walk through an example in JavaScript, but this applies equally well to any async model that is callback-driven under the hood.
Here’s our async JavaScript snippet. foo
waits for a short timeout
asynchronously, then calls bar
, which also waits for a short timeout before
calling blowUp
, which again waits before finally throw
ing an error. This
example, while simple, is nonetheless representative of an async task that
involves multiple async
functions with multiple await
suspension points.
If we run foo
, we get an error with a stack trace like this:
blowUp@example.js:22:9
That’s the complete stack: just a single stack frame. It doesn’t give us any
help with identifying who called blowUp
.
Why do we have just this lonely stack frame? Because what async
/await
boils
down to in JavaScript is roughly speaking0 a suspendable generator
function and a series of Promise
callbacks that keep pumping the generator
function to completion. The async
function’s desugared generator suspends at
await
points, yield
ing the Promise
it is waiting on. The runtime takes
that Promise
and enqueues a callback to resume the generator (via calling
next
) once the promise is resolved:
Whenever the Promise
is resolved, its queued callbacks are run on a new tick
of the event loop1, without any caller
on the stack.
If we visualize the call stack over time, with youngest frames on top and oldest frames on bottom, it looks like this:
╱╱ ╱╱ ╱╱
^ ╲╲ ┌─────┐ ╲╲ ┌────────┐ ╲╲
│ ╱╱ │ bar │ ╱╱ │ blowUp │ ╱╱
│ ┌─────┐ ╲╲ ┌┴─────┤ ╲╲ ┌┴────────┤ ╲╲ ┌────────┐
Stack │ foo │ ╱╱ │ foo │ ╱╱ │ bar │ ╱╱ │ blowUp │
│ └─────┘ ╲╲ └──────┘ ╲╲ └─────────┘ ╲╲ └────────┘
│ ╱╱ ╱╱ ╱╱
│ +100 ms +200 ms +300 ms
────────────────Time──────────────────>
After blowUp
sleeps for 300 milliseconds and throws an error, it is the only
frame on the stack, and there is nothing left pointing to its async callers
foo
and bar
.
How Rust does it
Now let’s look at some async Rust code that is roughly-equivalent to our previous JavaScript snippet:
When we compile and run this Rust program with RUST_BACKTRACE=1
to capture a
stack trace on panic, we get the following stack trace2, which has the async callers
foo
and bar
in addition to the youngest async callee blow_up
:
thread 'async-task-driver' panicked at 'nested async panic!', src/main.rs:42:5
stack backtrace:
...
8: async_stack_traces::blow_up::{{closure}}
at src/main.rs:42
...
15: async_stack_traces::bar::{{closure}}
at src/main.rs:37
...
22: async_stack_traces::foo::{{closure}}
at src/main.rs:32
...
In Rust, async
functions desugar into suspendable generator functions, which
in turn desugar into state machine
enum
s. Rather than
yield
ing futures, like how JavaScript’s desugared async
functions will
yield
promises, Rust’s desugared async
functions’ state machines implement
the Future
trait themselves3. The same way that futures
nest and it is common to have MyOuterFuture::poll
calling
MyInnerFuture::poll
, await
ing a future desugars into the generator polling
the nested future.
For example, if an activation of our async fn foo
is awaiting a task::sleep
future, it will poll that future inside its own poll implementation, and if it
is awaiting an activation of async fn bar
, then it will poll its bar
activation inside its own poll implementation:
With this nested polling, we have the async caller on the stack below its async callee.
To make it visual, here is the stack of Future::poll
trait method calls over
time for our Rust example:
╱╱ ╱╱ ╱╱
^ ╲╲ ╲╲ ┌────────┐ ╲╲
│ ╱╱ ╱╱ │ sleep │ ╱╱
│ ╲╲ ┌───────┐ ╲╲ ┌┴────────┤ ╲╲ ┌─────────┐
│ ╱╱ │ sleep │ ╱╱ │ blow_up │ ╱╱ │ blow_up │
│ ┌───────┐ ╲╲ ┌┴───────┤ ╲╲ ┌┴─────────┤ ╲╲ ├─────────┤
Stack │ sleep │ ╱╱ │ bar │ ╱╱ │ bar │ ╱╱ │ bar │
│ ┌┴───────┤ ╲╲ ┌┴────────┤ ╲╲ ├──────────┤ ╲╲ ├─────────┤
│ │ foo │ ╱╱ │ foo │ ╱╱ │ foo │ ╱╱ │ foo │
│ └────────┘ ╲╲ └─────────┘ ╲╲ └──────────┘ ╲╲ └─────────┘
│ ╱╱ ╱╱ ╱╱
│ +100 ms +200 ms +300 ms
────────────────Time──────────────────>
After blow_up
has slept for 300 milliseconds and panics, we still have its
async callers bar
and foo
on the stack, which is super useful for debugging.
Conclusion
Rust’s readiness-based, polling model for asynchronous code means that we always have the async caller that is waiting on an async callee function’s completion on the stack during polling. Therefore async callers show up in stack traces with async Rust code, not just the youngest async callee like in most other languages. In turn, this property makes debugging async code that much easier in Rust.
I wasn’t the first to notice this property of Rust’s futures and async
/await
design, nor was I involved in its design process. However, I don’t think most
folks have discovered this property yet, so I’m doing my part to help explain it
and spread the knowledge.
Thanks to Alex Crichton, Jason Orendorff, and Jim Blandy for providing feedback on an early draft of this blog post. Their feedback only improved this text, and any errors that remain are my own.
Appendix
What about “async stacks” in JavaScript?
To help users debug asynchronous code in JavaScript, engines introduced “async stacks”. The way this works, at least in SpiderMonkey, is to capture the stack when a promise is allocated or when some async operation begins, and then when running the async callbacks append that captured stack to any new stacks captured.
Async stacks come with both time and space overheads. These aren’t small costs, and there has been significant engineering work put into making the captured stack traces compact in SpiderMonkey. There are also limits on how many frames and how often async stacks are captured because of these costs.
Async stacks are also not a zero-cost abstraction, since the engine must capture the async portion and save them in memory regardless whether any execution actually ends up using them. On the other hand, since Rust’s async callers appear on the normal call stack, nothing needs to be captured eagerly and stored in memory.
What if I want even more context for my async Rust code?
The tracing
facade looks like a promising
framework for annotating async tasks, creating a hierarchy between tasks, and
recording structured data about program execution.
0 This is not the actual desugaring, but
is close enough to get the point across for our purposes. Read the actual
ECMAScript standard if you want to know what is
really going on under the hood of async
/await
in JavaScript functions. The
exact details are too nitpicky for (and largely irrelevant to) this blog
post. ↩
1 I’m ignoring micro vs. macro task queues in this blog post, because (again) those details are irrelevant here. ↩
2 I edited this stack trace to remove extra stack frames that come from the async runtime library and the stack unwinding library. In Rust, we have the opposite problem compared to other languages: we have too many stack frames! 🤣 ↩
3 Technically they are wrapped in a
std::future::GenFuture
via std::future::from_generator
, and there is this
Future
implementation for GenFuture
wrappers:
But this is an implementation detail that’s mostly irrelevant to this discussion. ↩