gh-83151: Make closure work on pdb by gaogaotiantian · Pull Request #111094 · python/cpython
Closure on pdb has been an issue for years, and is often considered "to difficult to fix". PEP709 alleviated the issue by inlining all list, dict and set comprehensions, but the fundamental issue is still there - you can easily reproduce the issue with lambda functions ((lambda: x)()) or generators (any(x for x in lst if x != z)). Now that pdb supports multi-line statement, you can even create your own function scope. Neither of them work in current pdb.
The related (open) issues I can find are #83151, #80052, #76987, #70260 and #65360.
This PR fixes all the issues mentioned above, while keep (almost?) all the current correct behaviors.
Boiler alert, black magic involved.
The fundamental reason for the issue above, is that when python compiles code that generates nested functions, the inner function has no idea about the context. For example:
def f(): z = 1 any([x for x in range(2) if x != z])
The code above works fine because python compiles any([x for x in range(2) if x != z]) with the awareness that z is a local variable of f, so closure should be used. However, in pdb, if you execute/compile any([x for x in range(2) if x != z]), there's no way for Python to know that z is an outer local variable and should be passed as a cell variable.
So, the solution is - trick the compiler.
What we can do, is to force compiler to compile the code we need in a closure-awareness environment. Take the code above as an example, instead of compiling any([x for x in range(2) if x != z]), we compile the following code:
def outer(): z = None def __pdb_scope(): nonlocal z any([x for x in range(2) if x != z])
Then we get the code object of outer.__pdb_scope, and that is the code object we actually want - it considers z as a free variable. Then, we pass the value of z with closure argument of exec:
exec(code, globals, locals, closure=(types.CellType(1),))
The solution is not done, as we still need to write the local variables back (locals won't be changed as all the variables are considered freevar). Also, it's possible for the code itself to create new variables and we want that in locals as well (k=1 for example). So, we just write all the local variables in the scope back by adding some new code at the end of the code.
To keep the original behavior as close as possible, if any exception is raised in the new method, we fallback to the original simple exec solution (it might be a valid exception, but we will always have the same exception as before).
Why not ...
- just combine
globals()andlocals()as the global namespace?- result of
globals()would be wrong - result of
global x; print(x)would be wrong ifxappears in both global and local namespace (if local is preferred, same issue if global is preferred). - impossible to do writeback correctly
- result of
- use the new method only?
- the new method will give different exception on some cases (
UnboundLocalErrorvsNameError) - the new method does not work well with
global x; print(x)asxwould be defined as bothglobalandnonlocal(it will fallback to the original method and work)
- the new method will give different exception on some cases (
This is not the perfect solution, but it's pretty close. It passed all the existing tests and fixed all the cases in the issues mentioned above. I have not came up with an example that does not work with this solution (I would guess there will be some dark corners).
Yes, I know this is complicated, and it kind of depends on the implementation of CPython, but I still believe we should do this, because this issue has been repeatedly raised by users of pdb for probably > 10 years.