Implement LOAD_ATTR inline caching with adaptive specialization by youknowone · Pull Request #7292 · RustPython/RustPython
Add type version counter (tp_version_tag) to PyType with subclass invalidation cascade. Add cache read/write methods (u16/u32/u64) to CodeUnits. Implement adaptive specialization in load_attr that replaces the opcode with specialized variants on first execution: - LoadAttrMethodNoDict: cached method lookup for slotted types - LoadAttrMethodWithValues: cached method with dict shadow check - LoadAttrInstanceValue: direct dict lookup skipping descriptors Specialized opcodes guard on type_version_tag and deoptimize back to generic LOAD_ATTR with backoff counter on cache miss.
BINARY_OP: Specialize int add/subtract/multiply and float add/subtract/multiply with type guards and deoptimization. CALL: Add func_version to PyFunction, specialize simple function calls (CallPyExactArgs, CallBoundMethodExactArgs) with invoke_exact_args fast path that skips FuncArgs allocation and fill_locals_from_args.
Move counter initialization from compile-time to RESUME execution, matching CPython's _PyCode_Quicken pattern. Store counter in CACHE entry's arg byte to preserve op=Instruction::Cache for dis/JIT. Add PyCode.quickened flag for one-time initialization.
- deoptimize() maps specialized opcodes back to their base adaptive variant - original_bytes() produces deoptimized bytecode with zeroed CACHE entries - co_code now returns deoptimized bytes, _co_code_adaptive returns current bytes - Marshal serialization uses original_bytes() instead of raw transmute
- Add bounds checks to read_cache_u16/u32/u64 - Fix quicken() aliasing UB by using &mut directly - Add JumpBackwardJit/JumpBackwardNoJit to deoptimize() - Guard can_specialize_call with NEWLOCALS flag check - Use compare_exchange_weak for version tag to prevent wraparound - Propagate dict lookup errors in LoadAttrMethodWithValues - Apply adaptive backoff on version tag assignment failure - Remove duplicate imports in frame.rs
This was referenced
youknowone added a commit to youknowone/RustPython that referenced this pull request
…Python#7292) * Implement LOAD_ATTR inline caching with adaptive specialization Add type version counter (tp_version_tag) to PyType with subclass invalidation cascade. Add cache read/write methods (u16/u32/u64) to CodeUnits. Implement adaptive specialization in load_attr that replaces the opcode with specialized variants on first execution: - LoadAttrMethodNoDict: cached method lookup for slotted types - LoadAttrMethodWithValues: cached method with dict shadow check - LoadAttrInstanceValue: direct dict lookup skipping descriptors Specialized opcodes guard on type_version_tag and deoptimize back to generic LOAD_ATTR with backoff counter on cache miss. * Add BINARY_OP and CALL adaptive specialization BINARY_OP: Specialize int add/subtract/multiply and float add/subtract/multiply with type guards and deoptimization. CALL: Add func_version to PyFunction, specialize simple function calls (CallPyExactArgs, CallBoundMethodExactArgs) with invoke_exact_args fast path that skips FuncArgs allocation and fill_locals_from_args. * Lazy quickening for adaptive specialization counters Move counter initialization from compile-time to RESUME execution, matching CPython's _PyCode_Quicken pattern. Store counter in CACHE entry's arg byte to preserve op=Instruction::Cache for dis/JIT. Add PyCode.quickened flag for one-time initialization. * Add Instruction::deoptimize() and CodeUnits::original_bytes() - deoptimize() maps specialized opcodes back to their base adaptive variant - original_bytes() produces deoptimized bytecode with zeroed CACHE entries - co_code now returns deoptimized bytes, _co_code_adaptive returns current bytes - Marshal serialization uses original_bytes() instead of raw transmute * Fix monitoring and specialization interaction - cache_entries() returns correct count for instrumented opcodes - deoptimize() maps instrumented opcodes back to base - quicken() skips adaptive counter for instrumented opcodes - instrument_code Phase 3 deoptimizes specialized opcodes and clears CACHE entries to prevent stale pointer dereferences * Address review: bounds checks, UB fix, version overflow, error handling - Add bounds checks to read_cache_u16/u32/u64 - Fix quicken() aliasing UB by using &mut directly - Add JumpBackwardJit/JumpBackwardNoJit to deoptimize() - Guard can_specialize_call with NEWLOCALS flag check - Use compare_exchange_weak for version tag to prevent wraparound - Propagate dict lookup errors in LoadAttrMethodWithValues - Apply adaptive backoff on version tag assignment failure - Remove duplicate imports in frame.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters