gh-105922: PyImport_AddModule() uses Py_DECREF() by vstinner · Pull Request #105998 · python/cpython
@serhiy-storchaka: In gh-86160, you added this code using PyWeakref_New() + PyWeakref_GET_OBJECT() get a borrowed reference, whereas the module is known to be stored in the sys.modules dictionary. What's the rationale for using a weak reference, rather than just using Py_DECREF()?
Rewrite PyImport_AddModule() to simply call Py_DECREF(), rather than creating a weak reference, to get a borrowed reference to the module.
Oh, ./python -m test -j1 -R 3:3 test_import leaks references ("test_import leaked [11, 13, 10] references, sum=34")... but it also leaks in the main branch. Ah! It's unrelated to this PR. Also, ./python -m test -R 3:3 test_import doesn't leak. It's inteesting :-)
sys.modules can be not a dict, and it is not guaranteed that it keeps reference. It can be a custom mapping with __getitem__ which returns a new object or removes returned object or __setitem__ which does nothing. There is so much unknown. With weakref we can be sure that we get either a reference to live object or NULL. Please do not change this code until you find better solution.
Also, ./python -m test -j1 -R 3:3 test_import does crash, but again, it's unrelated to this PR. It does also crash in the main branch. It's a recent regression: #91095 (comment)
PyImport_AddModule() history, oldest to newest:
-
In 1990, commit 85a5fbb (first commit which added C code to CPython, so oldest C code tracked by the Git repository) added
new_module()function which usesmtab = sysget("modules");to get modules.sysget()is implemented asdictlookup(sysdict, name)which looks intostatic object *sysdict;. This version respected sys.modules :-) -
In 1990, commit 3f5da24 added
static object *modules;variable toPython/import.c, variable used byadd_module()(old name of PyImport_AddModule()). sys.modules is no longer used by add_modules(). -
In 1997, commit 25ce566 added PyImport_GetModuleDict() implemented as
PyThreadState_Get()->interp. PyImport_AddModules() usesPyObject *modules = PyImport_GetModuleDict();and then works on this dictionary withPyDict_GetItemString()andPyDict_SetItemString(). -
In 2017, commit 3f9eee6 modified import_add_module() to "Support other mappings in PyInterpreterState.modules": issue [subinterpreters] Eliminate PyInterpreterState.modules. #72597.
-
In 2021, commit 4db8988 changes PyImport_AddModule() to use PyWeakref_New() to create a borrowed reference.
-
In 2023, commit b2fc549 changes
Python/import.cto use a new MODULES(interp) macro which getsinterp->imports.modules.
In 2017, commit 3f9eee6 modified import_add_module() to "Support other mappings in PyInterpreterState.modules": issue GH-72597.
I'm not convinced that GH-72597 was fully implemented. At least, PyImport_AddModule() doesn't support overriding sys.modules at runtime since 1990... so since the creation of Python :-)
I added a test to my PR to check this special case: custom sys.modules which doesn't hold a strong reference to the newly created module.
According to this test, PyImport_AddModule() always uses tstate->interp->imports.modules directly, it doesn't take in account that sys.modules was overriden. Moreover, setting sys.modules at runtime doesn't update this internal tstate->interp->imports.modules member: the sys module has no custom __setattr__() function to update it.
Calling PyImport_AddModule() writes into the "original" modules dict, even if sys.modules was overriden.
In the main branch, there is a single line which sets PyInterpreter.imports.modules: MODULES(interp) = PyDict_New(); of _PyImport_InitModules(). I don't see any C API, even internal C API, to changes this PyInterpreterState member. And the public C API PyImport_GetModuleDict() just returns this member.
It means that all C code using directly PyInterpreter.imports.modules can only get a dict, not a custom type.
Obviously, C code and Python code getting sys.modules attribute can get any type, not only dict, if sys.modules is overriden. For example, the _pickle extension (implemented in C) gets the sys.modules attribute and then... uses the PyDict C API, oops. It is likely to crash if sys.modules type is not dict or a subclass of dict.
Well, my goal here was to get rid of the PyWeakref_GET_OBJECT() since I would like to deprecate it. So I wrote a different change: PR #106001 uses _PyWeakref_GET_REF().
But maybe this whole code was always useless and so can be removed, since it seems impossible to get a type other than dict in this code path.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Py_REFCNT any better than PyWeakref_GetObject? I preferred to use later because the former is a macro and more tightly bound to internal representation.
I have great doubts that it was worth to add support of non-dict as sys.modules. It complicated a code a lot, it is not supported everywhere in the C code, and causes issues with borrowed references, like in this code.
I have great doubts that it was worth to add support of non-dict as sys.modules. It complicated a code a lot, it is not supported everywhere in the C code, and causes issues with borrowed references, like in this code.
The problem is that modules have no __setattr__() method which could be used to prevent setting sys.modules to a non-sense value like sys.modules = 123 or sys.modules = ["my", "list"]. Maybe some developers made the wrong assumption that internal C functions really access sys.modules attributes, whereas the implementation access directly a PyInterpreterState member and ignores sys.modules.
I created issue #106016 "Support defining setattr() and delattr() in a module". It would allow to either disallow setting sys.modules to a type different than dict, or at least to update PyInterpreterState.imports.modules reference.
|
|
||
| // gh-86160: PyImport_AddModuleObject() returns a borrowed reference | ||
| PyObject *ref = PyWeakref_NewRef(mod, NULL); | ||
| // Convert to a borrowed reference |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Convert to a borrowed reference | |
| // Convert to a borrowed reference. |
I fixed the issue differently: commit ee52158
I don't think that PyImport_AddModuleObject() requires this complicated PyWeakref dance, since the code always gets a strong reference from a Python dict: it's not possible to override PyInterpreterState.imports.modules, and this function access directly PyInterpreterState.imports.modules. But I don't feel the need to change the code. I propose to continue the discussion in the issue #106016.