◐ Shell
clean mode source ↗

GH-81079: Add case_sensitive argument to `pathlib.Path.glob()` by barneygale · Pull Request #102710 · python/cpython

@barneygale

This argument allows case-sensitive matching to be enabled on Windows, and case-insensitive matching to be enabled on Posix.

This PR removes _PreciseSelector and uses _WildcardSelector to select non-wildcard patterns. This allows case sensitivity rules to be varied, and ensures the paths returned from glob() use filesystem casing.

This argument allows case-sensitive matching to be enabled on Windows, and
case-insensitive matching to be enabled on Posix.

@barneygale barneygale changed the title GH-81079: Add case_sensitive argument to pathlib.Path.glob() GH-81079: Add case_sensitive argument to pathlib.Path.glob()

Mar 15, 2023

@barneygale

Hey @zooba, would you be available to review this? I think you already looked at a similar patch in #102616, so you may already have some useful context. No worries if not. Thanks!

@zooba

It might be nice to describe the =None value as meaning "using the native case comparison rules for the filesystem". Windows allows marking directories as case-sensitive, and one day we might actually respect that. Similarly, pretty sure you can use FAT from any OS, and we ought to treat that as case insensitive by default.

We don't have to implement those yet, but let's not lock ourselves out by documenting the specific rules. We can also file a feature request now to implement proper case support - it might help find someone motivated to make it happen.

Also, like with the symlink change, I'd like to see clearly stated which parts of the pattern are affected, as I assume it only applies to segments from the first wildcard (that is, foo/bar*/baz with case_sensitive=False on a case sensitive drive matches foo/BARF/BAZ but not FOO/barf/baz).

I trust you (and the tests) on the implementation. It takes me a good few hours to get this code into my head in a way I can judge it, and I don't have time for that today, but also don't want to block on it when all the tests look good :)

@barneygale

Thanks for taking a look!

It might be nice to describe the =None value as meaning "using the native case comparison rules for the filesystem". Windows allows marking directories as case-sensitive, and one day we might actually respect that. Similarly, pretty sure you can use FAT from any OS, and we ought to treat that as case insensitive by default.

We don't have to implement those yet, but let's not lock ourselves out by documenting the specific rules. We can also file a feature request now to implement proper case support - it might help find someone motivated to make it happen.

Good shout, will do!

Also, like with the symlink change, I'd like to see clearly stated which parts of the pattern are affected, as I assume it only applies to segments from the first wildcard (that is, foo/bar*/baz with case_sensitive=False on a case sensitive drive matches foo/BARF/BAZ but not FOO/barf/baz).

It affects all parts of the pattern! So it would match both your examples. I think this is the behaviour users expect when passing case_sensitive=False.

(The same thing will apply when we add support for follow_symlinks=True and follow_symlinks=False -- it will affect all parts of the pattern uniformly)

I trust you (and the tests) on the implementation. It takes me a good few hours to get this code into my head in a way I can judge it, and I don't have time for that today, but also don't want to block on it when all the tests look good :)

I'll expand the PR description to (hopefully) make it easier for you (or anyone else) to get into the patch.

@zooba

It affects all parts of the pattern! So it would match both your examples. I think this is the behaviour users expect when passing case_sensitive=False.

(The same thing will apply when we add support for follow_symlinks=True and follow_symlinks=False -- it will affect all parts of the pattern uniformly)

I was thinking of the =None case, especially since for case_sensitive that's a legitimate and arguably more useful value (whereas for symlinks it's deprecated/legacy). But applying to the whole pattern, even if the first few parts are constants, seems fine.

@barneygale

@barneygale

@barneygale

zooba

Co-authored-by: Steve Dower <steve.dower@microsoft.com>

zooba

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

carljm added a commit to carljm/cpython that referenced this pull request

May 5, 2023