Optimize unpack, str.__add__ and fastlocals by youknowone · Pull Request #7293 · RustPython/RustPython
Push elements directly from tuple/list slice in reverse order instead of cloning into a temporary Vec first.
Add Relaxed load guard before the Acquire swap to avoid cache-line invalidation on every instruction dispatch when no signal is pending.
Pre-compute builtins.downcast_ref::<PyDict>() at frame entry and reuse the cached reference in load_global_or_builtin and LoadBuildClass. Also add get_chain_exact to skip redundant exact_dict type checks.
binary_op1 can now resolve str+str addition directly via the number slot instead of falling through to the sequence concat path.
Address CodeRabbit review: f_locals() could access fastlocals without synchronization when called from another thread. Use try_lock on the state mutex so concurrent access is properly serialized.
downcast_ref::<PyDict>() matches dict subclasses, causing get_chain_exact to bypass custom __getitem__ overrides. Use downcast_ref_if_exact to only fast-path exact dict types.
Move the recursion depth check to wrap the entire _cmp body instead of each individual call_cmp direction, reducing Cell read/write pairs and scopeguard overhead per comparison.
- FOR_ITER: detect PyRangeIterator and bypass generic iterator protocol (atomic slot load + indirect call) - COMPARE_OP: inline int/float comparison for exact types, skip rich_compare dispatch and with_recursion overhead - BINARY_OP: inline int add/sub with i64 checked arithmetic to avoid BigInt heap allocation and binary_op1 dispatch
get_chain_exact bypasses __missing__ on dict subclasses. Move get_chain_exact to PyExact<PyDict> impl with debug_assert, and have get_chain delegate to it. Store builtins_dict as Option<&PyExact<PyDict>> to enforce exact type at compile time. Use PyRangeIterator::next_fast() instead of pub(crate) fields. Fix comment style issues.
This was referenced
youknowone added a commit to youknowone/RustPython that referenced this pull request
* Remove intermediate Vec allocation in unpack_sequence fast path Push elements directly from tuple/list slice in reverse order instead of cloning into a temporary Vec first. * Use read-only atomic load before swap in check_signals Add Relaxed load guard before the Acquire swap to avoid cache-line invalidation on every instruction dispatch when no signal is pending. * Cache builtins downcast in ExecutingFrame for LOAD_GLOBAL Pre-compute builtins.downcast_ref::<PyDict>() at frame entry and reuse the cached reference in load_global_or_builtin and LoadBuildClass. Also add get_chain_exact to skip redundant exact_dict type checks. * Add number Add slot to PyStr for direct str+str dispatch binary_op1 can now resolve str+str addition directly via the number slot instead of falling through to the sequence concat path. * Guard FastLocals access in locals() with try_lock on state mutex Address CodeRabbit review: f_locals() could access fastlocals without synchronization when called from another thread. Use try_lock on the state mutex so concurrent access is properly serialized. * Use exact type check for builtins_dict cache downcast_ref::<PyDict>() matches dict subclasses, causing get_chain_exact to bypass custom __getitem__ overrides. Use downcast_ref_if_exact to only fast-path exact dict types. * Consolidate with_recursion in _cmp to single guard Move the recursion depth check to wrap the entire _cmp body instead of each individual call_cmp direction, reducing Cell read/write pairs and scopeguard overhead per comparison. * Add opcode-level fast paths for FOR_ITER, COMPARE_OP, BINARY_OP - FOR_ITER: detect PyRangeIterator and bypass generic iterator protocol (atomic slot load + indirect call) - COMPARE_OP: inline int/float comparison for exact types, skip rich_compare dispatch and with_recursion overhead - BINARY_OP: inline int add/sub with i64 checked arithmetic to avoid BigInt heap allocation and binary_op1 dispatch * Also check globals is exact dict for LOAD_GLOBAL fast path get_chain_exact bypasses __missing__ on dict subclasses. Move get_chain_exact to PyExact<PyDict> impl with debug_assert, and have get_chain delegate to it. Store builtins_dict as Option<&PyExact<PyDict>> to enforce exact type at compile time. Use PyRangeIterator::next_fast() instead of pub(crate) fields. Fix comment style issues.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters