{{ message }}
gh-146455: Fix O(N²) in add_const() after constant folding moved to CFG#146456
Merged
Eclips4 merged 3 commits intoApr 26, 2026
Merged
gh-146455: Fix O(N²) in add_const() after constant folding moved to CFG#146456Eclips4 merged 3 commits into
Eclips4 merged 3 commits into
Conversation
…d to CFG The add_const() function in flowgraph.c uses a linear search over the consts list to find the index of a constant. After pythongh-126835 moved constant folding from the AST optimizer to the CFG optimizer, this function is now called N times for N inner tuple elements during fold_tuple_of_constants(), resulting in O(N²) total time. Fix by maintaining an auxiliary _Py_hashtable_t that maps object pointers to their indices in the consts list, providing O(1) lookup. For a file with 100,000 constant 2-tuples: - Before: 10.38s (add_const occupies 83.76% of CPU time) - After: 1.48s
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Sorry, something went wrong.
Eclips4
reviewed
Apr 21, 2026
Eclips4
reviewed
Apr 21, 2026
Member
|
FWIW, I've run refleak tests and everything seems to be fine. |
Sorry, something went wrong.
Member
|
The results from the issue are confirmed. After this PR (I took the example from the issue), Macbook Pro M3 Pro: |
Sorry, something went wrong.
Hide details
View details
Eclips4
merged commit
5d41632
into
python:main
Apr 26, 2026
54 checks passed
Member
Sorry, something went wrong.
iritkatriel
reviewed
Apr 26, 2026
Eclips4
pushed a commit
that referenced
this pull request
Apr 26, 2026
…ed to CFG (GH-146456) (#149011) gh-146455: Fix O(N²) in add_const() after constant folding moved to CFG (GH-146456) The add_const() function in flowgraph.c uses a linear search over the consts list to find the index of a constant. After gh-126835 moved constant folding from the AST optimizer to the CFG optimizer, this function is now called N times for N inner tuple elements during fold_tuple_of_constants(), resulting in O(N²) total time. Fix by maintaining an auxiliary _Py_hashtable_t that maps object pointers to their indices in the consts list, providing O(1) lookup. For a file with 100,000 constant 2-tuples: - Before: 10.38s (add_const occupies 83.76% of CPU time) - After: 1.48s (cherry picked from commit 5d41632) Co-authored-by: zSirius <107359899+zSirius@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.
Fix O(N²) performance regression in
add_const()introduced by moving constant folding from AST to CFG optimizer (gh-126835).Problem
After gh-130769 moved tuple folding to CFG,
fold_tuple_of_constants()callsadd_const()once per inner tuple element.add_const()does a linear scan over theconstslist to find the index, so N calls × O(N) scan = O(N²).The same issue affects unary/binary op folding moved in gh-129550 (
fold_const_unaryop,fold_const_binop).perfprofiling showsadd_consttaking 83.76% of CPU time when compiling 100K nested constant tuples.Fix
Maintain an auxiliary
_Py_hashtable_t(pointer → index mapping) alongside theconstslist, providing O(1) constant lookup. The hashtable:_Py_hashtable_hash_ptr/_Py_hashtable_compare_direct— pure pointer ops, no Python object overhead_PyCfg_OptimizeCodeUnit()and destroyed afteroptimize_cfg(), beforeremove_unused_consts()reindexes the list_PyCompile_ConstCacheMergeOne()already guarantees identity uniqueness (equal-valued constants share the same pointer)All modified functions are
static— no public API changes.Performance (N=100K)
((f, f), ...)(-1, -2, ..., -N)(0+1, 0+2, ..., 0+N)All existing tests pass:
test_compile,test_peepholer,test_ast,test_dis.