GH-77609: Add follow_symlinks argument to pathlib.Path.glob()#102616
GH-77609: Add follow_symlinks argument to pathlib.Path.glob()#102616barneygale merged 17 commits into
pathlib.Path.glob()#102616Conversation
Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and `rglob()`, defaulting to false. When set to true, symlinks to directories are followed as if they were directories. Previously these methods followed symlinks except when evaluating "`**`" wildcards; on Windows they returned paths in filesystem casing except when evaluating non-wildcard tokens. Both these problems are solved here. This will allow us to address pythonGH-102613 and pythonGH-81079 in future commits.
|
Table showing whether symlinks to directories are followed:
Table showing whether filesystem case is returned:
|
Sorry, something went wrong.
|
I wrote a fairly long post about my concerns on the changed behaviour, but then I realised that my main question is whether the symlink following changes applies to the entire path or only the bit in pattern? If symlinks in the base path are going to be followed regardless of the option, I'm far less concerned. However, I'd like to be able to figure this out from the documentation :) So there's a change required there to clarify (and maybe I'll be more concerned if it clarifies the wrong way) |
Sorry, something went wrong.
|
Symlinks in the base path (the p in |
Sorry, something went wrong.
|
Thanks for the feedback btw! I'm also nervous about changing behaviour. I could revise this to make |
Sorry, something went wrong.
|
I don't want to add I think the behaviour of path parts before any wildcard are easily assumed to be equivalent of being part of the base path. That is: Everything from the first wildcard becomes a match pattern. We need to recurse deep enough to create paths with the same number of segments, and then match all the candidates against the pattern. Since we're now recursing, choosing not to follow symlinks by default is justifiable. Given the current behaviour is to follow I'm pretty much leaning towards needing a deprecation period though. I think that'll be safest, and probably necessary, so the best way to do that is to support |
Sorry, something went wrong.
|
Right! I think I agree that leading literals should always be followed. What about trailing literals, e.g. ... but that optimization only works if the |
Sorry, something went wrong.
|
Also agree on the need to preserve existing behaviour and emit deprecation warnings before we change defaults. I'd argue the default should become true, not false, in future. It's the more useful behaviour. |
Sorry, something went wrong.
|
I've switched the default to Passing
I mention this only because the names "glob" and "rglob" are a bit opaque unless you've done shell scripting before. The suggested names align with |
Sorry, something went wrong.
|
Passing
Yeah, and so if we have Maybe we can optimise the case where the |
Sorry, something went wrong.
👍 sounds good. As things stand with this PR, This PR still breaks backwards compat in two (minor?) ways:
|
Sorry, something went wrong.
|
Matching filesystem case is an improvement, IMHO, even for back-compat. On the I also don't have any real feeling for whether we should |
Sorry, something went wrong.
Pathlib uniformly avoids collapsing I've fixed handling of An updated table showing whether symlinks are followed:
Note that, for the moment, literals are handled exactly the same whether they appear before or after recursive wildcards. |
Sorry, something went wrong.
|
Looks good. We should still add the deprecation warning for |
Sorry, something went wrong.
|
Done. But honestly I don't feel great about |
Sorry, something went wrong.
|
Oh right, I also need to update the docs examples to pass |
Sorry, something went wrong.
|
Alternative PR that adds support for a |
Sorry, something went wrong.
|
Closing this PR as I believe #104176 is the right way forward. |
Sorry, something went wrong.
|
@zooba I'd be much more comfortable with this PR if we delayed deprecating |
Sorry, something went wrong.
|
We can document it as deprecated with no planned removal date (or planned change of default, in this case). Or just strongly recommend the use of If we think (or people start saying) that the default is harmful/a trap, we can deprecate properly at that point (and wait two releases before changing it). |
Sorry, something went wrong.
|
In fact, it doesn't really matter whether #104512 lands first. Marking as ready for review! |
Sorry, something went wrong.
|
On |
Sorry, something went wrong.
Interesting. I think if you can clearly explain the different behaviour ("following symlinks matched by |
Sorry, something went wrong.
|
I think we're in agreement, then! The docs in this PR say: By default, or when the *follow_symlinks* keyword-only argument is set to
``None``, this method follows symlinks except when expanding "``**``"
wildcards. Set *follow_symlinks* to ``True`` to always follow symlinks, or
``False`` to treat all symlinks as files.Does that sound OK to you? |
Sorry, something went wrong.
|
Sounds good to me |
Sorry, something went wrong.
|
Mega, thank you. Would you be up for reviewing the PR as a whole, if you have the time? No worries if not, and no urgency from my side. The patch is somewhat repetitive due to the way globbing is currently implemented. If you find it offensive, we might want to simplify things first via #104512 |
Sorry, something went wrong.
|
Woot! Thank you Steve! |
Sorry, something went wrong.
Add a keyword-only follow_symlinks parameter to
pathlib.Path.glob()andrglob().When follow_symlinks is
None(the default), these methods follow symlinks except when evaluating "**" wildcards. When set to true or false, symlinks are always or never followed, respectively.This allows us to address GH-102613 in future.