Harden commit trailer subprocess handling and align trailer I/O paths by Copilot · Pull Request #2125 · gitpython-developers/GitPython
- Retrieve the latest PR review comments and inspect the affected trailer code
- Evaluate which new comments are substantive versus non-meritorious or nit-level
- Fix the substantive subprocess-lifetime issue in trailer handling
- Add focused regression coverage for the new failure path if needed
- Install local validation dependencies if required in this environment
- Re-run focused lint, typecheck, and tests
- Run final validation and summarize the comments left unaddressed
Byron
marked this pull request as ready for review
Copilot AI review requested due to automatic review settings
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors how commit trailers are generated/parsed via git interpret-trailers, aiming to harden subprocess handling and to better align trailer processing with commit message encoding behavior.
Changes:
- Centralize
git interpret-trailersinvocation in a newCommit._interpret_trailers()helper and reuse it from both trailer parsing and trailer application. - Adjust trailer parsing (
trailers_list) and trailer application (create_from_tree) to use the shared helper. - Add a test that exercises trailer creation/parsing when
i18n.commitencodingis set to a non-default encoding (ISO-8859-1).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
git/objects/commit.py |
Introduces _interpret_trailers() helper and updates trailer parse/apply paths to use it. |
test/test_commit.py |
Adds coverage for trailer handling when i18n.commitencoding is configured to ISO-8859-1. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| trailer: str = proc.communicate(str(self.message).encode())[0].decode("utf8") | ||
| trailer = trailer.strip() | ||
| trailer = self._interpret_trailers(self.repo, self.message, ["--parse"]).strip() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailers_list calls _interpret_trailers without passing/using the commit's actual encoding (self.encoding). Since commit messages can be configured via i18n.commitencoding, trailer parsing should encode/decode using that encoding to avoid mis-decoding or UnicodeDecodeError for non-UTF-8 commit messages. Consider passing self.encoding through to _interpret_trailers (or have the helper derive the encoding from the commit/repo config).
| trailer = self._interpret_trailers(self.repo, self.message, ["--parse"]).strip() | |
| proc = Popen( | |
| [Git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", "--parse"], | |
| stdin=PIPE, | |
| stdout=PIPE, | |
| stderr=PIPE, | |
| cwd=self.repo.working_tree_dir or self.repo.git_dir, | |
| ) | |
| stdout, _stderr = proc.communicate(self.message.encode(self.encoding)) | |
| finalize_process(proc) | |
| trailer = stdout.decode(self.encoding).strip() |
Copilot uses AI. Check for mistakes.
Comment on lines +466 to +476
| def _interpret_trailers(cls, repo: "Repo", message: Union[str, bytes], trailer_args: Sequence[str]) -> str: | ||
| cmd = [repo.git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", *trailer_args] | ||
| proc: Git.AutoInterrupt = repo.git.execute( # type: ignore[call-overload] | ||
| cmd, | ||
| as_process=True, | ||
| istream=PIPE, | ||
| ) | ||
| message_bytes = message if isinstance(message, bytes) else message.encode(cls.default_encoding, errors="strict") | ||
| stdout_bytes, _ = proc.communicate(message_bytes) | ||
| finalize_process(proc) | ||
| return stdout_bytes.decode(cls.default_encoding, errors="strict") |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_interpret_trailers always encodes/decodes using cls.default_encoding (UTF-8) even when the repo/commit uses a different i18n.commitencoding. This can break trailer handling for commits whose bytes are not valid UTF-8 (e.g., ISO-8859-1), especially if callers pass bytes. Recommend adding an explicit encoding parameter (defaulting to the repo’s configured commit encoding) and using it consistently for both .encode() and .decode().
Copilot uses AI. Check for mistakes.
| stdout_bytes, _ = proc.communicate(str(message).encode()) | ||
| finalize_process(proc) | ||
| message = stdout_bytes.decode("utf8") | ||
| message = cls._interpret_trailers(repo, str(message), trailer_args) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_from_tree has already computed conf_encoding from i18n.commitencoding, but trailer application calls _interpret_trailers without using it. To keep trailer I/O aligned with how the commit will be serialized, pass conf_encoding (or have _interpret_trailers derive it from the repo) when applying trailers.
| message = cls._interpret_trailers(repo, str(message), trailer_args) | |
| message = cls._interpret_trailers( | |
| repo, | |
| str(message), | |
| trailer_args, | |
| conf_encoding=conf_encoding, | |
| ) |
Copilot uses AI. Check for mistakes.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| stdout_bytes, _ = proc.communicate(str(message).encode()) | ||
| finalize_process(proc) | ||
| message = stdout_bytes.decode("utf8") | ||
| message = cls._interpret_trailers(repo, str(message), trailer_args) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In create_from_tree(), trailers are applied via _interpret_trailers() without passing the repo-configured commit encoding (conf_encoding). This means interpret-trailers I/O is always UTF-8 even when the commit will be serialized using a different encoding, which is inconsistent with trailers_list() and can produce different on-disk bytes. Pass encoding=conf_encoding when calling _interpret_trailers here (and avoid double str() conversion if possible).
| message = cls._interpret_trailers(repo, str(message), trailer_args) | |
| if not isinstance(message, str): | |
| message = str(message) | |
| message = cls._interpret_trailers(repo, message, trailer_args, encoding=conf_encoding) |
Copilot uses AI. Check for mistakes.
Comment on lines +473 to +482
| cmd = [repo.git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", *trailer_args] | ||
| proc: Git.AutoInterrupt = repo.git.execute( # type: ignore[call-overload] | ||
| cmd, | ||
| as_process=True, | ||
| istream=PIPE, | ||
| ) | ||
| message_bytes = message if isinstance(message, bytes) else message.encode(encoding, errors="strict") | ||
| stdout_bytes, _ = proc.communicate(message_bytes) | ||
| finalize_process(proc) | ||
| return stdout_bytes.decode(encoding, errors="strict") |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_interpret_trailers() starts the subprocess before encoding message to bytes. If message.encode(...) raises (e.g., UnicodeEncodeError under a legacy commitencoding), the git process is left running until GC/AutoInterrupt finalization. Compute message_bytes before launching the process, or wrap subprocess lifetime in try/finally to ensure it is waited/terminated on all failure paths.
| cmd = [repo.git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", *trailer_args] | |
| proc: Git.AutoInterrupt = repo.git.execute( # type: ignore[call-overload] | |
| cmd, | |
| as_process=True, | |
| istream=PIPE, | |
| ) | |
| message_bytes = message if isinstance(message, bytes) else message.encode(encoding, errors="strict") | |
| stdout_bytes, _ = proc.communicate(message_bytes) | |
| finalize_process(proc) | |
| return stdout_bytes.decode(encoding, errors="strict") | |
| message_bytes = message if isinstance(message, bytes) else message.encode(encoding, errors="strict") | |
| cmd = [repo.git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", *trailer_args] | |
| proc: Git.AutoInterrupt = repo.git.execute( # type: ignore[call-overload] | |
| cmd, | |
| as_process=True, | |
| istream=PIPE, | |
| ) | |
| try: | |
| stdout_bytes, _ = proc.communicate(message_bytes) | |
| return stdout_bytes.decode(encoding, errors="strict") | |
| finally: | |
| finalize_process(proc) |
Copilot uses AI. Check for mistakes.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment on lines +480 to +484
| try: | ||
| stdout_bytes, _ = proc.communicate(message_bytes) | ||
| return stdout_bytes.decode(encoding, errors="strict") | ||
| finally: | ||
| finalize_process(proc) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In _interpret_trailers, proc.communicate() captures stderr but it’s discarded, and then finalize_process(proc) may raise GitCommandError without including the subprocess stderr (pipes may already be drained/closed). Capture stderr_bytes from communicate() and pass it into finalize_process(proc, stderr=stderr_bytes) so failures surface with a useful error message.
| try: | |
| stdout_bytes, _ = proc.communicate(message_bytes) | |
| return stdout_bytes.decode(encoding, errors="strict") | |
| finally: | |
| finalize_process(proc) | |
| stderr_bytes = None | |
| try: | |
| stdout_bytes, stderr_bytes = proc.communicate(message_bytes) | |
| return stdout_bytes.decode(encoding, errors="strict") | |
| finally: | |
| finalize_process(proc, stderr=stderr_bytes) |
Copilot uses AI. Check for mistakes.
| stdout_bytes, _ = proc.communicate(str(message).encode()) | ||
| finalize_process(proc) | ||
| message = stdout_bytes.decode("utf8") | ||
| message = cls._interpret_trailers(repo, str(message), trailer_args) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_from_tree reads i18n.commitencoding into conf_encoding, but the trailer application path calls _interpret_trailers(...) without passing that encoding (so it always encodes/decodes as UTF-8). To keep trailer I/O aligned with the configured commit encoding (and avoid byte-length differences affecting wrapping/formatting), pass encoding=conf_encoding when calling _interpret_trailers here.
| message = cls._interpret_trailers(repo, str(message), trailer_args) | |
| message = cls._interpret_trailers(repo, str(message), trailer_args, encoding=conf_encoding) |
Copilot uses AI. Check for mistakes.