Under-estimated stack size for recursion limit (macOS + python >= 3.14)
Bug report
Bug description:
GH-130398 introduced estimation of stack size (for the purpose of recursion-limit management) via pthread_getattr_np() on platforms that support it. This non-portable function is not available on macOS, and so the following fallback codepath is used:
| _tstate->c_stack_top = _Py_SIZE_ROUND_UP(here_addr, 4096); | |
| _tstate->c_stack_soft_limit = _tstate->c_stack_top - Py_C_STACK_SIZE; | |
| _tstate->c_stack_hard_limit = _tstate->c_stack_top - (Py_C_STACK_SIZE + _PyOS_STACK_MARGIN_BYTES); |
with Py_C_STACK_SIZE being set to 4 MB on macOS:
| #if defined(__s390x__) | |
| # define Py_C_STACK_SIZE 320000 | |
| #elif defined(_WIN32) | |
| // Don't define Py_C_STACK_SIZE, ask the O/S | |
| #elif defined(__ANDROID__) | |
| # define Py_C_STACK_SIZE 1200000 | |
| #elif defined(__sparc__) | |
| # define Py_C_STACK_SIZE 1600000 | |
| #elif defined(__hppa__) || defined(__powerpc64__) | |
| # define Py_C_STACK_SIZE 2000000 | |
| #else | |
| # define Py_C_STACK_SIZE 4000000 | |
| #endif |
However, this is too conservative, as the python's build script explicitly increases the stack to 16 MB:
| Darwin/*|iOS/*) | |
| LINKFORSHARED="$extra_undefs -framework CoreFoundation" | |
| # Issue #18075: the default maximum stack size (8MBytes) is too | |
| # small for the default recursion limit. Increase the stack size | |
| # to ensure that tests don't crash | |
| stack_size="1000000" # 16 MB | |
| if test "$with_ubsan" = "yes" | |
| then | |
| # Undefined behavior sanitizer requires an even deeper stack | |
| stack_size="4000000" # 64 MB | |
| fi | |
| AC_DEFINE_UNQUOTED([THREAD_STACK_SIZE], | |
| [0x$stack_size], | |
| [Custom thread stack size depending on chosen sanitizer runtimes.]) | |
| if test $ac_sys_system = "Darwin"; then | |
| LINKFORSHARED="-Wl,-stack_size,$stack_size $LINKFORSHARED" | |
| if test "$enable_framework"; then | |
| LINKFORSHARED="$LINKFORSHARED "'$(PYTHONFRAMEWORKDIR)/Versions/$(VERSION)/$(PYTHONFRAMEWORK)' | |
| fi | |
| LINKFORSHARED="$LINKFORSHARED" | |
| elif test $ac_sys_system = "iOS"; then | |
| LINKFORSHARED="-Wl,-stack_size,$stack_size $LINKFORSHARED "'$(PYTHONFRAMEWORKDIR)/$(PYTHONFRAMEWORK)' | |
| fi | |
| ;; |
This was observed in GH-131543, but was not investigated further, because the primary issue there turned out to be excessive stack consumption due to inlining (GH-137573).
Nevertheless, the under-estimated stack size still negatively impacts the recursion limit.
Using slightly modified reproducer from #131543 (comment) that keeps trying to dynamically increase recursion limit as it keeps recursing:
// stackpointer.c // // gcc -shared -fpic -o stackpointer.dylib stackpointer.c // // import ctypes // sp = ctypes.CDLL('./stackpointer.dylib') // address = sp.get_machine_stack_pointer() #include <stdint.h> uintptr_t get_machine_stack_pointer(void) { return (uintptr_t)__builtin_frame_address(0); }
# recursion_limit_test.py import sys import ctypes import ctypes.util # Obtain stack address and stack size as reported by non-portable pthread functions: libc = ctypes.CDLL(ctypes.util.find_library('c')) libc.pthread_self.restype = ctypes.c_void_p libc.pthread_get_stackaddr_np.argtypes = [ctypes.c_void_p] libc.pthread_get_stackaddr_np.restype = ctypes.c_void_p libc.pthread_get_stacksize_np.argtypes = [ctypes.c_void_p] libc.pthread_get_stacksize_np.restype = ctypes.c_ulonglong this_thread = libc.pthread_self() stack_address = libc.pthread_get_stackaddr_np(this_thread) stack_size = libc.pthread_get_stacksize_np(this_thread) print(f"Stack address: {stack_address} = 0x{stack_address:X}") print(f"Stack size: {stack_size} = {stack_size / 1024.0} kB = {stack_size / 1024**2} MB") # Helper for tracking stack pointer location splib = ctypes.CDLL('./stackpointer.dylib') splib.get_machine_stack_pointer.restype = ctypes.c_void_p stack_pointer = splib.get_machine_stack_pointer() print(f"Stack pointer: 0x{stack_pointer:X}, depth: {(stack_address - stack_pointer)/1024:.2f} kB") # Recursion limit test limit = sys.getrecursionlimit() counter = 0 class A: def __getattribute__(self, name): global counter counter += 1 stack_pointer = splib.get_machine_stack_pointer() print(f"Recursion level: {counter}, stack pointer: 0x{stack_pointer:X}, depth: {(stack_address - stack_pointer)/1024:.2f} kB") # Increase recursion limit, if necessary global limit if counter + 1 >= limit: limit *= 2 print(f"Increasing recursion limit: {limit}") sys.setrecursionlimit(limit) # Recurse return getattr(self, name) a = A() print("Testing Recursion Limit") print(f"Initial limit: {limit}") try: a.test except RecursionError: print(f"Recursion Limit ok (reached level {counter})")
Running with python 3.13:
% python3.13 recursion_limit_test.py
Stack address: 6101024768 = 0x16BA64000
Stack size: 16777216 = 16384.0 kB = 16.0 MB
Stack pointer: 0x16BA622A0, depth: 7.34 kB
Testing Recursion Limit
Initial limit: 1000
Recursion level: 1, stack pointer: 0x16BA61CD0, depth: 8.80 kB
Recursion level: 2, stack pointer: 0x16BA616A0, depth: 10.34 kB
Recursion level: 3, stack pointer: 0x16BA610B0, depth: 11.83 kB
...
Recursion level: 4997, stack pointer: 0x16B323CD0, depth: 7424.80 kB
Recursion level: 4998, stack pointer: 0x16B3236E0, depth: 7426.28 kB
Recursion Limit ok (reached level 4999)
Note that even with the old approach, only about half of the actual 16 MB stack was used before the recursion limit kicked in (as evident from estimated stack depth).
With 3.14(.0rc2):
% python3.14 recursion_limit_test.py
Stack address: 6091063296 = 0x16B0E4000
Stack size: 16777216 = 16384.0 kB = 16.0 MB
Stack pointer: 0x16B0E1CC0, depth: 8.81 kB
Testing Recursion Limit
Initial limit: 1000
Recursion level: 1, stack pointer: 0x16B0E1120, depth: 11.72 kB
Recursion level: 2, stack pointer: 0x16B0E0550, depth: 14.67 kB
...
Recursion level: 1320, stack pointer: 0x16AD13470, depth: 3906.89 kB
Recursion level: 1321, stack pointer: 0x16AD128A0, depth: 3909.84 kB
Recursion Limit ok (reached level 1322)
With 3.14, we don't get far above the original limit before the recursion limit kicks in, under assumption of 4 MB stack.
CPython versions tested on:
3.14
Operating systems tested on:
macOS