When profiling an async Python application it's useful to see both the
stack for the currently executing task as well as the chain of coroutines
that are transitively awaiting the task. Consider the following example,
where T represents a task, C represents a coroutine, and A '->' B
indicates A is awaiting B.
T0 +---> T1
| | |
C0 | C2
| | |
v | v
C1 | C3
| |
+-----|
The async stack from C3 would be C3, C2, C1, C0. In contrast, the
synchronous call stack while C3 is executing is only C3, C2. It's
possible to reconstruct this view in most cases using what is
currently available in CPython, however it's difficult to do so
efficiently, and would be very challenging to do so, let alone
efficiently, in an out of process profiler that leverages eBPF.
This introduces a new field onto coroutines and async generators
that makes it easy to efficiently reconstruct the async call stack.
The field stores an owned reference, set by the interpreter, to
the coroutine or async generator that is awaiting the field's owner.
To reconstruct the chain of coroutines/async generators one only
needs to walk the new field backwards.
Intermediate awaitables (e.g. `Task`, `_GatheringFuture`) complicate
maintaining a complete chain of awaiters. A special method, `__set_awaiter__`
is introduced to simplify the process. Types can provide an implementation
of this method to forward the awaiter on child objects.