◐ Shell
reader mode source ↗
Skip to content

fix pyexpat#6582

Merged
youknowone merged 1 commit into
RustPython:mainfrom
youknowone:pyexpat
Dec 29, 2025
Merged

fix pyexpat#6582
youknowone merged 1 commit into
RustPython:mainfrom
youknowone:pyexpat

Conversation

@youknowone

@youknowone youknowone commented Dec 29, 2025

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • New Features

    • Parser now accepts both text and binary XML input.
    • Optional namespace-separator support for namespaced element/attribute names.
    • Interning behavior now respects a provided intern mapping for names.
  • Bug Fixes / Behavior Changes

    • Empty input is treated as a successful parse.
    • Parse and file-parse operations return consistent integer status.
    • Namespace-separator input is validated to a single character; explicit encoding argument is ignored.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Dec 29, 2025

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@youknowone has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6d84c60 and 5e7f7ef.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_pyexpat.py is excluded by !Lib/**
  • Lib/test/test_xml_dom_xmlbuilder.py is excluded by !Lib/**
  • Lib/test/test_xmlrpc.py is excluded by !Lib/**
  • Lib/xmlrpc/client.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/stdlib/src/pyexpat.rs
📝 Walkthrough

Walkthrough

Parse now accepts either string or bytes and converts to bytes before parsing. A new optional namespace_separator is stored and used by a make_name helper to produce names; intern is initialized from the provided argument. do_parse now returns Results, Parse/ParseFile surface integer status, and ParserCreate validates namespace_separator and forwards intern/namespace_separator to the parser constructor.

Changes

Cohort / File(s) Summary
pyexpat core changes
crates/stdlib/src/pyexpat.rs
Large functional update: parse accepts Either<PyStrRef, PyBytesRef> and returns PyResult<i32>; do_parse now returns Result<_, xml::reader::Error>; ParseFile returns integer status and guards empty input.
Parser configuration & constructors
crates/stdlib/src/pyexpat.rs
ParserCreateArgs fields changed from OptionalArg to Option/Option<PyObjectRef>; PyExpatLikeXmlParser::new signature updated to accept namespace_separator: Option<String> and intern: Option<PyObjectRef>; parser_create validates namespace_separator length.
Name handling & intern behavior
crates/stdlib/src/pyexpat.rs
Added namespace_separator: Option<String> field, make_name(&self, name: &xml::name::OwnedName) -> String, and logic to use namespace+separator for element/attribute names; intern is initialized from the provided argument and used for name internment.
Imports & API surface
crates/stdlib/src/pyexpat.rs
Added imports: Either and PyBytesRef; adjusted public signatures and exported types to match new behavior.
Error handling / compatibility
crates/stdlib/src/pyexpat.rs
Parsing flow changed to propagate xml reader errors from do_parse; Parse/ParseFile now map certain errors to the legacy integer success value (1) for compatibility; encoding parameter is ignored.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I hopped through names, a tiny code sprite,
Joined bytes and strings in soft moonlight.
A separator stitched namespaces tight,
Interns snug in a dictionary bed,
Parsers purr as XML dreams are read.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'fix pyexpat' is vague and generic, using non-descriptive language that doesn't convey what specific issue or functionality is being addressed in this substantial refactoring. Replace with a more specific title that describes the main change, such as 'Add namespace support and bytes handling to pyexpat parser' or 'Extend pyexpat parser with namespace separator and bytes data support'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review December 29, 2025 14:56

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/stdlib/src/pyexpat.rs (1)

362-368: ParserCreate arguments are currently unused.

The encoding, namespace_separator, and intern parameters defined in ParserCreateArgs are not being used in the parser_create function. This means XML encoding declarations and namespace handling are not currently supported. While this appears to be existing technical debt (indicated by the _args prefix), users may expect these parameters to have an effect based on CPython's pyexpat API.

Would you like me to open an issue to track implementing support for these ParserCreate parameters?

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8059960 and b7b81a4.

⛔ Files ignored due to path filters (1)
  • Lib/xmlrpc/client.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/stdlib/src/pyexpat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/pyexpat.rs
🧠 Learnings (1)
📚 Learning: 2025-12-27T14:03:49.011Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.011Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/stdlib/src/pyexpat.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
🔇 Additional comments (2)
crates/stdlib/src/pyexpat.rs (2)

52-52: LGTM: Import additions support the new functionality.

The addition of PyBytesRef and Either imports correctly enables the Parse method to accept both string and bytes inputs.

Also applies to: 54-54


354-360: Option with #[pyarg(any, optional)] is a valid pattern in RustPython.

The codebase uses both OptionalArg<T> and Option<T> with #[pyarg(any, optional)]. While OptionalArg is more commonly used, Option is also a supported and legitimate pattern, as evidenced by its use in multiple stdlib modules (codecs.rs, sys.rs, lzma.rs, openssl.rs) and elsewhere. The change is consistent with established patterns in the codebase.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/stdlib/src/pyexpat.rs (1)

114-152: Validate the intern parameter type.

The intern parameter is accepted as PyObjectRef without type validation. If a non-dict object is passed, it could cause runtime errors when dict operations are attempted later. Consider validating that intern is a dict type when Some is provided.

🔎 Proposed validation
 fn new(
     namespace_separator: Option<String>,
     intern: Option<PyObjectRef>,
     vm: &VirtualMachine,
 ) -> PyResult<PyExpatLikeXmlParserRef> {
-    let intern_dict = intern.unwrap_or_else(|| vm.ctx.new_dict().into());
+    let intern_dict = match intern {
+        Some(obj) => {
+            // Verify it's a dict
+            if !obj.is(&vm.ctx.types.dict_type) {
+                return Err(vm.new_type_error("intern must be a dictionary".to_owned()));
+            }
+            obj
+        }
+        None => vm.ctx.new_dict().into(),
+    };
     Ok(Self {
♻️ Duplicate comments (2)
crates/stdlib/src/pyexpat.rs (2)

342-362: Silently ignoring all parsing errors may hide legitimate issues.

While the comment indicates this is for compatibility with existing Python code, silently ignoring all errors from do_parse (line 360) means legitimate parsing errors are never surfaced to the caller. Consider at least logging errors or handling specific error types that should be propagated.

This concern was previously raised in past review comments.


365-377: Same error handling concern as in Parse.

Like the parse method, this silently ignores all parsing errors (line 375). The same concerns about hiding legitimate issues apply here.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7b81a4 and 6d84c60.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_pyexpat.py is excluded by !Lib/**
  • Lib/test/test_xml_dom_xmlbuilder.py is excluded by !Lib/**
  • Lib/test/test_xmlrpc.py is excluded by !Lib/**
  • Lib/xmlrpc/client.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/stdlib/src/pyexpat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/pyexpat.rs
🧠 Learnings (4)
📚 Learning: 2025-12-24T23:47:38.444Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 0
File: :0-0
Timestamp: 2025-12-24T23:47:38.444Z
Learning: CPython's xml.parsers.expat xmlparser exposes a namespace_prefixes boolean attribute (implemented in the pyexpat C module) that controls namespace prefix reporting via XML_SetReturnNSTriplet. This attribute is used internally by xml.dom.expatbuilder and xml.etree.ElementTree.

Applied to files:

  • crates/stdlib/src/pyexpat.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/pyexpat.rs
📚 Learning: 2025-12-27T14:03:49.011Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.011Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/stdlib/src/pyexpat.rs
📚 Learning: 2025-12-24T23:47:38.444Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 0
File: :0-0
Timestamp: 2025-12-24T23:47:38.444Z
Learning: CPython's xml.parsers.expat xmlparser does not expose an intern attribute. String interning for XML parsing in CPython is handled through SAX's expatreader feature_string_interning mechanism, not as a direct xmlparser.intern attribute.

Applied to files:

  • crates/stdlib/src/pyexpat.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run tests under miri
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check Rust code with clippy
🔇 Additional comments (5)
crates/stdlib/src/pyexpat.rs (5)

52-54: LGTM!

The new imports are necessary for accepting both string and bytes input in the parse method.


69-70: LGTM!

The namespace_separator field is properly added with the correct #[pytraverse(skip)] annotation for non-Python object data.


291-297: LGTM!

The make_name helper correctly constructs element names with namespace prefixes when both separator and namespace are present.


299-339: LGTM!

The error handling is now properly structured with Result return type, and make_name is consistently used for namespace-aware naming.


381-388: LGTM!

The change from OptionalArg<T> to Option<T> is appropriate here since these are parsed arguments with #[pyarg(any, optional)]. The intern type change to PyObjectRef is also correct since it accepts dict objects.

Hide details View details @youknowone youknowone merged commit f91ffe3 into RustPython:main Dec 29, 2025
1 of 2 checks passed
@youknowone youknowone deleted the pyexpat branch December 29, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant