…verflow
The depth-bounded recursion still overflowed the debugger's C stack on
platforms with a small default thread stack (Windows uses 1 MiB): every
level keeps a SIZEOF_TASK_OBJ buffer alive in process_task_awaited_by, so
the MAX_TASK_AWAITED_BY_DEPTH limit of 1000 was only reached after several
MiB of stack had already been consumed, and the process aborted with a
stack overflow before the limit could fire.
Walk the awaited_by graph with an explicit, heap-allocated work-stack
instead of mutual recursion, so the C stack depth stays constant no matter
how deep the graph is. The depth limit is retained as a cycle guard for
corrupted or concurrently-mutated remote memory.
Also make the regression test deterministic under load: signal readiness
from the leaf task itself, immediately before it busy-spins, so the
observer always inspects while the full chain is built and rooted at a
running task. The previous handshake was sent before the leaf started
running and could race, letting the observer see a shallow graph.