◐ Shell
clean mode source ↗

Support pickle better with __getnewargs_ex__ by youknowone · Pull Request #6749 · RustPython/RustPython

📝 Walkthrough

Walkthrough

Refactors object pickling/reduction to add a new reduce_newobj path and helpers for extracting getnewargs/getstate/slot and item data, adds getstate/setstate to IO wrapper classes for pickling, interns getnewargs_ex in VM context, and removes a legacy Python reducelib module.

Changes

Cohort / File(s) Summary
Object pickling & reduction
crates/vm/src/builtins/object.rs
Adds get_new_arguments, is_getstate_overridden, object_getstate, get_items_iter, reduce_newobj; refactors protocol>=2 reduction to use reduce_newobj; tighter itemsize and basicsize validation; clearer pickle error messages.
IO wrapper pickling
crates/vm/src/stdlib/io.rs
Adds __getstate__ / __setstate__ implementations to IO wrapper classes (BytesIO, TextIOWrapper, StringIO and other wrappers), exports PyDict, and enforces state-shape validation and closed-state checks.
VM context constants
crates/vm/src/vm/context.rs
Adds __getnewargs_ex__ field to ConstName for interned magic method name.
Removed legacy module
crates/vm/Lib/python_builtins/__reducelib.py
Deletes the Python-side reducelib module that previously provided reduce_2/slotnames/_abstract_method_error helpers.
Thread cleanup minor refactor
crates/vm/src/stdlib/thread.rs
Replaced map_or(false, ...) with is_some_and(...) in a retain filter; no semantic change.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • fanninpm

Poem

🐰
I nibbled bytes and stored a state,
Gathered args to reconstruct fate,
IO trunks now pack and send,
Basicsize checked from end to end,
Hopping home with pickles great! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: adding support for getnewargs_ex to improve pickle functionality, which aligns with the primary modifications across object.rs, io.rs, and context.rs.
Docstring Coverage ✅ Passed Docstring coverage is 81.25% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c5d1dd and a418c68.

⛔ Files ignored due to path filters (11)
  • Lib/test/test_copy.py is excluded by !Lib/**
  • Lib/test/test_csv.py is excluded by !Lib/**
  • Lib/test/test_descr.py is excluded by !Lib/**
  • Lib/test/test_enum.py is excluded by !Lib/**
  • Lib/test/test_lzma.py is excluded by !Lib/**
  • Lib/test/test_memoryio.py is excluded by !Lib/**
  • Lib/test/test_pickle.py is excluded by !Lib/**
  • Lib/test/test_pickletools.py is excluded by !Lib/**
  • Lib/test/test_typing.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_runner.py is excluded by !Lib/**
  • Lib/test/test_zlib.py is excluded by !Lib/**
📒 Files selected for processing (5)
  • crates/vm/Lib/python_builtins/__reducelib.py
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/io.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/vm/context.rs
💤 Files with no reviewable changes (1)
  • crates/vm/Lib/python_builtins/__reducelib.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/vm/src/vm/context.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style using cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/io.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (12)
crates/vm/src/stdlib/io.rs (5)

161-162: LGTM!

The PyDict import is correctly added to support the __setstate__ implementations that need to restore the __dict__ attribute.


4081-4103: LGTM!

The __getstate__ implementation correctly captures the buffer content, position, and optional __dict__. The TODO comment appropriately documents that the actual newline setting will be stored once implemented.


4105-4144: LGTM! Previous feedback addressed.

The __setstate__ implementation now includes the closed-file guard at line 4108, which was the concern raised in the previous review. The implementation correctly validates the tuple length, restores buffer content and position, and handles the __dict__ restoration properly.


4294-4313: LGTM!

The __getstate__ implementation correctly captures bytes content, position, and optional __dict__. Using a 3-tuple (without newline) is appropriate for binary I/O.


4315-4351: LGTM!

The __setstate__ implementation correctly:

  • Checks closed state before modification
  • Uses try_resizable to ensure no active buffer exports exist (line 4332)
  • Restores content and position
  • Handles __dict__ restoration properly
crates/vm/src/stdlib/thread.rs (1)

518-523: LGTM! Cleaner idiomatic Rust.

Using is_some_and instead of map_or(false, ...) is the preferred pattern for this check. Functionally equivalent with improved readability.

crates/vm/src/builtins/object.rs (6)

564-624: LGTM - Proper implementation of new arguments extraction.

The function correctly handles both __getnewargs_ex__ and __getnewargs__ with appropriate type validation and error messages matching CPython's behavior.


626-644: LGTM!

The identity comparison correctly detects whether __getstate__ is overridden by comparing method objects from the MRO.


646-656: LGTM!

Correct delegation: uses the default implementation with the required flag when not overridden, otherwise invokes the user-defined __getstate__.


658-674: LGTM!

Correctly provides item iterators for list/dict subclasses as required by the pickle protocol for proper content restoration.


736-743: LGTM!

Clean routing: protocol >= 2 uses the new reduce_newobj path while older protocols fall back to the copyreg helper.


187-239: LGTM - pickle validation logic is correct.

The itemsize check (line 190) and basicsize calculation correctly implement pickle validation. The logic properly accounts for dict size (if HAS_DICT flag), weakref support detection (with separate handling for heap vs built-in types), and slot sizes before comparing against the actual class basicsize.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.