◐ Shell
clean mode source ↗

Code nits in `csv.rs` by ShaharNaveh · Pull Request #7834 · RustPython/RustPython

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 613-623: The lineterminator handling currently forces a single
byte and rejects CRLF; before converting to a single-byte Terminator::Any, check
the extracted string value (from args.kwargs.swap_remove("lineterminator") ->
try_to_value::<&str>(vm)) for the two-character "\r\n" sequence and if matched
set res.lineterminator = Some(csv_core::Terminator::CRLF) (or the equivalent
CRLF variant on csv_core::Terminator); otherwise proceed with the existing
behavior that converts the &str to bytes and enforces exactly_one to create
Terminator::Any, returning the same type error on other invalid lengths.

---

Outside diff comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 1004-1024: The current pre-pass using split + trim_spaces
(functions/variables: trim_spaces, skipinitialspace, delimiter, input) rewrites
quoted fields and trims trailing spaces incorrectly before csv_core parsing;
remove this pre-pass and instead implement skipinitialspace behavior inside the
CSV parser/tokenizer (the csv_core parsing logic) so you only ignore a single
optional space immediately after a delimiter when not inside a quoted field,
without trimming trailing spaces or touching characters inside quotes; adjust
the parser's state machine to, on seeing a delimiter and then a space, skip that
single space only if not in a quoted context, and delete the trim_spaces/split
usage and the String::from_utf8 roundtrip so raw input is preserved for
csv_core.
- Around line 866-927: In to_writer, avoid the reachable panic from the todo!()
when self.quotechar == Some(None) (produced by from_args) and ensure a dialect's
quoting is preserved: first resolve an effective dialect/options by merging
DialectItem::Str (lookup in GLOBAL_HASHMAP) or DialectItem::Obj with the
explicit self.* overrides (delimiter, quotechar, doublequote, lineterminator,
escapechar, quoting), validate that an effective quotechar of None is turned
into a Python-level error (return a PyErr) instead of calling todo!(), and then
build the csv_core::Writer once using those resolved values (extract the
differing quotechar/quoting values first, then call
builder.delimiter(...).quote(...).double_quote(...).terminator(...).escape(...).quote_style(...)
as appropriate) so the dialect quoting is preserved when self.quoting is not
provided; refer to the to_writer function, DialectItem::Str/Obj, GLOBAL_HASHMAP,
and the fields quotechar and quoting when making these changes.
- Around line 793-863: The to_reader function currently applies reader settings
per-branch and then layers overrides in a way that ignores the selected
dialect's quoting and lineterminator when those fields are unset; fix it by
first resolving one effective PyDialect (handle DialectItem::Str via
GLOBAL_HASHMAP lookup, DialectItem::Obj, and the default "excel") to extract
base values for delimiter, doublequote, quotechar, quoting, and lineterminator,
then create a single csv_core::ReaderBuilder from those resolved base values and
finally apply explicit overrides from self (self.delimiter, self.quotechar,
self.quoting, self.lineterminator, self.doublequote, self.escapechar) only when
they are Some, removing duplicated builder logic and the redundant
reader.terminator(...) at the end so the final reader reflects the selected
dialect unless explicitly overridden.