◐ Shell
clean mode source ↗

gh-83151: Make closure work on pdb by gaogaotiantian · Pull Request #111094 · python/cpython

Closure on pdb has been an issue for years, and is often considered "to difficult to fix". PEP709 alleviated the issue by inlining all list, dict and set comprehensions, but the fundamental issue is still there - you can easily reproduce the issue with lambda functions ((lambda: x)()) or generators (any(x for x in lst if x != z)). Now that pdb supports multi-line statement, you can even create your own function scope. Neither of them work in current pdb.

The related (open) issues I can find are #83151, #80052, #76987, #70260 and #65360.

This PR fixes all the issues mentioned above, while keep (almost?) all the current correct behaviors.

Boiler alert, black magic involved.

The fundamental reason for the issue above, is that when python compiles code that generates nested functions, the inner function has no idea about the context. For example:

def f():
    z = 1
    any([x for x in range(2) if x != z])

The code above works fine because python compiles any([x for x in range(2) if x != z]) with the awareness that z is a local variable of f, so closure should be used. However, in pdb, if you execute/compile any([x for x in range(2) if x != z]), there's no way for Python to know that z is an outer local variable and should be passed as a cell variable.

So, the solution is - trick the compiler.

What we can do, is to force compiler to compile the code we need in a closure-awareness environment. Take the code above as an example, instead of compiling any([x for x in range(2) if x != z]), we compile the following code:

def outer():
    z = None
    def __pdb_scope():
        nonlocal z
        any([x for x in range(2) if x != z])

Then we get the code object of outer.__pdb_scope, and that is the code object we actually want - it considers z as a free variable. Then, we pass the value of z with closure argument of exec:

exec(code, globals, locals, closure=(types.CellType(1),))

The solution is not done, as we still need to write the local variables back (locals won't be changed as all the variables are considered freevar). Also, it's possible for the code itself to create new variables and we want that in locals as well (k=1 for example). So, we just write all the local variables in the scope back by adding some new code at the end of the code.

To keep the original behavior as close as possible, if any exception is raised in the new method, we fallback to the original simple exec solution (it might be a valid exception, but we will always have the same exception as before).

Why not ...

  • just combine globals() and locals() as the global namespace?
    • result of globals() would be wrong
    • result of global x; print(x) would be wrong if x appears in both global and local namespace (if local is preferred, same issue if global is preferred).
    • impossible to do writeback correctly
  • use the new method only?
    • the new method will give different exception on some cases (UnboundLocalError vs NameError)
    • the new method does not work well with global x; print(x) as x would be defined as both global and nonlocal (it will fallback to the original method and work)

This is not the perfect solution, but it's pretty close. It passed all the existing tests and fixed all the cases in the issues mentioned above. I have not came up with an example that does not work with this solution (I would guess there will be some dark corners).

Yes, I know this is complicated, and it kind of depends on the implementation of CPython, but I still believe we should do this, because this issue has been repeatedly raised by users of pdb for probably > 10 years.