Respect `os.Pathlike` by George-Ogden · Pull Request #2086 · gitpython-developers/GitPython
Fixes #2085
Replaces instances of str(path) with path.__fspath__() for more general usage.
I also moved the clone tests into a separate file that existed before but only contained a single test.
Also, I notice you skip mypy because it produces errors. Why not pin the version and increment it periodically?
That's a great incentive, thanks so much! Using __fspath__() seems like it's using a private API, so it looks strange to my untrained eye. But if it's correct, I guess it's worth doing anyway?
Also, I notice you skip mypy because it produces errors. Why not pin the version and increment it periodically?
I don't see why this wouldn't work, great idea! Please feel free to set this up in your next PR :).
It seems that I won't get my copilot review here, so I wonder why you'd not be calling os.fspath(path) instead?
It seems that I won't get my copilot review here, so I wonder why you'd not be calling
os.fspath(path)instead?
I agree that's much nicer and also gets rid of theif not isinstance(path, str)checks.
I don't see why this wouldn't work, great idea! Please feel free to set this up in your next PR :).
Sure thing
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks better indeed.
Something I think will be desirable is to actually add tests for non-decodable paths for the functions/types that are affected. Otherwise, how do we know it's working?
Feel free to use AI for that, it's quite good at this usually.
The idea is to prove that the changes actually make something possible that wasn't possible before.
| # When pathlib.Path or other class-based path is passed | ||
| if not isinstance(path, str): | ||
| path = str(path) | ||
| url = os.fspath(url) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URL is not a path though.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, url can be a path if you're cloning a local repo, and if it's not os.fspath will leave strings alone.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few tests, but lots of the calls work on internal APIs, so they won't make much difference. In these cases, calling os.fspath is still better as it makes the intention clearer. I also made a tool to check for redundant uses and the only one is git/util.py:420.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, thanks a lot!
What should be shown in the tests is that it can now handle filepaths that don't decode with the standard encoding.
Candidate tests are:
- clone from a filesystem path
index.addwith a strange path- and probably more - you could intentionally break a piece of code that changed
fspathand see which tests fail, then it's clear where to add a test with a 'special' path.
Thanks for making this happen, and thanks for your understanding - we can't just make changes hoping it will work or doesn't make things work, but there must be real evidence that this is desirable. And we can only have that with tests that fail without this change.
Byron
marked this pull request as draft
I am putting the PR back to draft until there are tests that use the new "can handle paths encoding independently from the runtime encoding" to prove the changes are effective. Thanks again.
Sorry, I meant to add this comment with the changes I made yesterday:
I've gone through all the changes and made sure that they apply (and removed the ones that didn't). As a result, there are more tests for submodules and index, as well as the existing ones I added.
For example, the clone_from_pathlike test means that all the fs.path conversions in Repo.base are required and removing them will cause that test to fail. Is that what you mean by "can handle paths encoding independently from the runtime encoding"? If not, I can add tests that will do that.
No problem!
What I mean is that the point of this conversion is to prevent decoding issues related to paths. I.e. the user passes a filesystem path, but internally GitPython runs str() on it which then tries to re-encode the path-bytes to the python runtime default encoding. This typically fails as soon as there is one non-ascii character.
You'd have to go back to main and add such tests which should fail, to then show that this now works with the changes in this branch, for a particular scenario - like one of the ones I mentioned, but there might be more.
I've added some non-ASCII characters into the tests. However, the main point of the pull request is with objects that follow the os.Pathlike type hint, such as this:
import os from dataclasses import dataclass @dataclass class CustomPathlike: path: str def __fspath__(self) -> str: return self.path custom_pathlike = CustomPathlike("folder/file") str(custom_pathlike) # "CustomPathlike(path='folder/file')" - fails in "git add CustomPathlike(path='folder/file')" os.fspath(custom_pathlike) # "folder/file" - works in "git add folder/file"
I realise that that was unclear in both the PR and the original issue (so I've added this example to the original issue).
Thanks a lot, this definitely helped me understand that these changes aren't meant to address that one thing I thought they do 😅.
My feeling is that the added tests don't actually fail on master, and thus should be removed. But if they do, please feel free to put them into a separate PR so I see them failing.
Either way, once I hear from you I think the PR can be merged.
See #2088 for failing tests
Thanks for giving it a shot. Admittedly, @2088 is surprisingly complex and fails in more places than just the two I thought it would fail in. To my mind, a cherry-pick of 1710626 onto master would have done the trick (or a manual port of these tests), instead there is a lot of new commits.
Do you want to try again, or remove the added tests here in absence of a demonstration of failure?
Thanks a lot for your patience, just one more thing: 1710626 adds tests which I believe should have shown that they won't work on master, but work here. The idea is that the encoding of the files can't be determined if Python would try to decode them, which a lot of code here does.
If you don't feel like this has a chance to be fixed, then I think these tests can be removed by force-pushing without this commit for this PR to be merged. Otherwise, there can be a PR on master that shows how these two tests are failing because of some decoding issue, and this PR will be merged as well.
Thank you, and… we will get there :).
However, both
| def test_clone_from_with_path_contains_unicode(self): | |
| with tempfile.TemporaryDirectory() as tmpdir: | |
| unicode_dir_name = "\u0394" | |
| path_with_unicode = os.path.join(tmpdir, unicode_dir_name) | |
| os.makedirs(path_with_unicode) | |
| try: | |
| Repo.clone_from( | |
| url=self._small_repo_url(), | |
| to_path=path_with_unicode, | |
| ) | |
| except UnicodeEncodeError: | |
| self.fail("Raised UnicodeEncodeError") |
and
| def test_add_utf8P_path(self, rw_dir): | |
| # NOTE: fp is not a Unicode object in Python 2 | |
| # (which is the source of the problem). | |
| fp = osp.join(rw_dir, "ø.txt") | |
| with open(fp, "wb") as fs: | |
| fs.write("content of ø".encode("utf-8")) | |
| r = Repo.init(rw_dir) | |
| r.index.add([fp]) | |
| r.index.commit("Added orig and prestable") |
Ran on
main successfully (and were included before this PR)
The difference in
| def test_index_add_pathlike_unicode(self, rw_repo): | |
| git_dir = Path(rw_repo.git_dir) | |
| file = git_dir / "file-áēñöưḩ̣.txt" | |
| file.touch() | |
| rw_repo.index.add(PathLikeMock(str(file))) |
which fails on
main is that the path is wrapped in a PathLikeMock, which has an interface similar to the one described above:| @dataclass | |
| class PathLikeMock: | |
| path: str | |
| def __fspath__(self) -> str: | |
| return self.path |
I see, thanks for clarifying and digging out the non-unicode tests on main which have been there before and seemed to be working already.
Then I even more so think that the tests in 1710626 as duplicates of the tests above serve no purpose. The PathLike test is already done.
What am I missing?