◐ Shell
clean mode source ↗

Add more unicode functions to c-api by bschoenmaeckers · Pull Request #8044 · RustPython/RustPython

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 8ba77755-8543-4cae-89d2-bc2800ebb7d5

📥 Commits

Reviewing files that changed from the base of the PR and between 66105ff and c832e32.

📒 Files selected for processing (1)
  • crates/capi/src/unicodeobject.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/capi/src/unicodeobject.rs

📝 Walkthrough

Walkthrough

This PR extends the RustPython C-API layer by adding four new Unicode FFI functions that expose UTF-8 encoding, filesystem-default codec operations, and bytes-to-string decoding. Import statements are consolidated, and disabled test cases cover the new functionality for interning, UTF-8 wrapping, bytes decoding, and filesystem codec round-trips.

Changes

Unicode C-API FFI Extensions

Layer / File(s) Summary
Unicode FFI functions and imports
crates/capi/src/unicodeobject.rs
Four new public C-API functions added: PyUnicode_AsUTF8String encodes to UTF-8, PyUnicode_DecodeFSDefaultAndSize and PyUnicode_EncodeFSDefault handle filesystem-default codec operations, and PyUnicode_FromEncodedObject decodes from bytes-like objects. All functions handle null inputs, validate parameters, downcast to PyStr where needed, and delegate encoding/decoding to the codec registry. Imports are consolidated at the file top.
Unicode encoding and decoding tests
crates/capi/src/unicodeobject.rs
Test suite (currently disabled) validates string interning, UTF-8 encoding via the wrapper, decoding from encoded bytes objects, and round-trip filesystem-default encoding/decoding on Unix for non-UTF-8 and UTF-8 filenames.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • RustPython/RustPython#7904: Modifies the same Unicode FFI module to add PyUnicode_AsEncodedString and other C-API encoding functions with similar null-handling and codec-registry delegation patterns.

Suggested reviewers

  • youknowone

Poem

🐰 A rabbit hops through Unicode streams,
Encoding strings in filesystem dreams—
UTF-8 paths and bytes set free,
Four new functions, tested with glee! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add more unicode functions to c-api' accurately describes the main change: adding new Unicode-related FFI entrypoints (PyUnicode_AsUTF8String, PyUnicode_DecodeFSDefaultAndSize, PyUnicode_EncodeFSDefault, PyUnicode_FromEncodedObject) to the C-API module.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.