Issue 46600: Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call
Created on 2022-02-01 13:06 by vstinner, last changed 2022-04-11 14:59 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| stack_overflow-4.py | vstinner, 2022-02-01 13:40 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 31052 | merged | vstinner, 2022-02-01 13:12 | |
| PR 31058 | merged | vstinner, 2022-02-01 16:24 | |
| Messages (14) | |||
|---|---|---|---|
| msg412252 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 13:06 | |
Measure using this script on the main branch (commit 108e66b6d23efd0fc2966163ead9434b328c5f17): --- import _testcapi def f(): yield _testcapi.stack_pointer() print(_testcapi.stack_pointer() - next(f())) --- Stack usage depending on the compiler and compiler optimization level: * clang -O0: 9,104 bytes * clang -Og: 736 bytes * gcc -O0: 6,784 bytes * gcc -Og: 624 bytes -O0 allocates around 10x more memory. Moreover, "./configure --with-pydebug CC=clang" uses -O0 in CFLAGS, because "clang --help" output doesn't containt "-Og". I'm working on a configure change to use -Og on clang which supports it. |
|||
| msg412253 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 13:15 | |
GH-31052 enables -Og when using clang and ./configure --with-pydebug and so the example uses 736 bytes instead of 9,104 bytes. |
|||
| msg412254 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 13:16 | |
This issue is a follow-up of bpo-46542 "test_json and test_lib2to3 crash on s390x Fedora Clang 3.x buildbot". |
|||
| msg412255 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 13:22 | |
Previous issues about stack memory usage, work done in 2017: * bpo-28870: Reduce stack consumption of PyObject_CallFunctionObjArgs() and like * bpo-29227: Reduce C stack consumption in function calls * bpo-29465: Modify _PyObject_FastCall() to reduce stack consumption 29464 I summarized the results in the "Stack consumption" section of my article: https://vstinner.github.io/contrib-cpython-2017q1.html |
|||
| msg412256 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 13:25 | |
See also bpo-30866: "Add _testcapi.stack_pointer() to measure the C stack consumption". |
|||
| msg412258 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 13:40 | |
stack_overflow-4.py: Update script from bpo-30866 to measure stack memory usage before Python crash or raises a RecursionError. I had to modify the script since calling a Python function from a Python function no longer allocates (additional) memory on the stack! See bpo-45256 "Remove the usage of the C stack in Python to Python calls". |
|||
| msg412260 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 13:46 | |
stack_overflow-4.py output depending on the compiler and compiler flags. gcc -O3 (./configure): --- test_python_call: 11904 calls before crash, stack: 704.1 bytes/call test_python_iterator: 17460 calls before crash, stack: 480.0 bytes/call test_python_getitem: 245760 calls before recursion error, stack: 0.2 bytes/call => total: 275124 calls, 1184.3 bytes per call --- It's better than stack memory usage in 2017: https://bugs.python.org/issue30866#msg297826 clang -O3 (./configure CC=clang): --- test_python_call: 10270 calls before crash, stack: 816.1 bytes/call test_python_iterator: 14155 calls before crash, stack: 592.0 bytes/call test_python_getitem: 245760 calls before recursion error, stack: 0.3 bytes/call => total: 270185 calls, 1408.4 bytes per call --- clang allocates a little bit more memory on the stack than gcc. I didn't try PGO or LTO yet. |
|||
| msg412261 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 13:47 | |
New changeset 0515eafe55ce7699e3bbc3c1555f08073d43b790 by Victor Stinner in branch 'main': bpo-46600: ./configure --with-pydebug uses -Og with clang (GH-31052) https://github.com/python/cpython/commit/0515eafe55ce7699e3bbc3c1555f08073d43b790 |
|||
| msg412278 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2022-02-01 15:25 | |
PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :( |
|||
| msg412286 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 16:06 | |
> PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :( test_gdb fails if Python is built with clang -Og. I don't think that it's a regression. It's just that previously, buildbots using clang only build Python with -O0 or -O3. I'm investigating the test_gdb issue: it's easy to reproduce on Linux (clang 13.0.0). I may skip test_gdb is Python is built with clang -Og. |
|||
| msg412294 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-01 17:12 | |
New changeset bebaa95fd0f44babf8b6bcffd8f2908c73ca259e by Victor Stinner in branch 'main': bpo-46600: Fix test_gdb.test_pycfunction() for clang -Og (GH-31058) https://github.com/python/cpython/commit/bebaa95fd0f44babf8b6bcffd8f2908c73ca259e |
|||
| msg412334 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2022-02-02 02:50 | |
FWIW, it seems -O0 don't merge local variables in different path or lifetime.
For example, see _Py_abspath
```
if (path[0] == '\0' || !wcscmp(path, L".")) {
wchar_t cwd[MAXPATHLEN + 1];
//(snip)
}
//(snip)
wchar_t cwd[MAXPATHLEN + 1];
```
wchar_t is 4bytes and MAXPATHLEN is 4096 on Linux. So each cwd is 16388bytes.
-Og allocates 32856 bytes for it and -Og allocates 16440 bytes for it.
I don't know what is the specific optimization flag in -Og do merge local variable, but I think -Og is very important for _PyEval_EvalFrameDefault() since it has many local variables in huge switch-case statements.
-Og allocates 312 bytes for it and -O0 allocates 8280 bytes for it.
By the way, clang 13 has `-fstack-usage` option like gcc, but clang 12 don't have it.
Since Ubuntu 20.04 have only clang 12, I use `-fstack-size-segment` and https://github.com/mvanotti/stack-sizes to get stack size.
|
|||
| msg412348 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2022-02-02 10:04 | |
> For example, see _Py_abspath For functions which are commonly called in Python at runtime, it may be worth it to manually merged large local variables to save a few bytes on the stack when Python is built with -O0. For _Py_abspath(), this function is only called at startup, if I recall correctly, so it should be a big issue in practice. |
|||
| msg412407 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2022-02-03 00:00 | |
I didn't mean _Py_abspath is problem. I just used it to describe why -O0 and -Og is so different. We can reduce stack usage of it easily, but it is not a problem than _PyEval_EvalFrameDefault. It is difficult to reduce stack usage of _PyEval_EvalFrameDefault with -O0. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:55 | admin | set | github: 90758 |
| 2022-02-03 00:00:28 | methane | set | messages: + msg412407 |
| 2022-02-02 10:04:49 | vstinner | set | messages: + msg412348 |
| 2022-02-02 05:16:18 | corona10 | set | nosy:
+ corona10 |
| 2022-02-02 02:50:12 | methane | set | nosy:
+ methane messages: + msg412334 |
| 2022-02-01 17:12:36 | vstinner | set | messages: + msg412294 |
| 2022-02-01 16:24:05 | vstinner | set | pull_requests: + pull_request29240 |
| 2022-02-01 16:06:29 | vstinner | set | messages: + msg412286 |
| 2022-02-01 15:25:55 | pablogsal | set | nosy:
+ pablogsal messages: + msg412278 |
| 2022-02-01 13:47:26 | vstinner | set | messages: + msg412261 |
| 2022-02-01 13:46:28 | vstinner | set | messages: + msg412260 |
| 2022-02-01 13:40:57 | vstinner | set | files:
+ stack_overflow-4.py messages: + msg412258 |
| 2022-02-01 13:25:28 | vstinner | set | messages: + msg412256 |
| 2022-02-01 13:22:24 | vstinner | set | messages: + msg412255 |
| 2022-02-01 13:16:29 | vstinner | set | messages: + msg412254 |
| 2022-02-01 13:15:31 | vstinner | set | messages: + msg412253 |
| 2022-02-01 13:12:00 | vstinner | set | keywords:
+ patch stage: patch review pull_requests: + pull_request29234 |
| 2022-02-01 13:06:00 | vstinner | create | |
