GH-73435: Implement recursive wildcards in pathlib.PurePath.match()#101398
GH-73435: Implement recursive wildcards in pathlib.PurePath.match()#101398barneygale merged 40 commits into
pathlib.PurePath.match()#101398Conversation
…ch() Add a new *recursive* argument to `pathlib.PurePath.match()`, defaulting to `False`. If set to true, `match()` handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments. We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`.
|
|
Sorry, something went wrong.
zooba
left a comment
There was a problem hiding this comment.
Approved, but consider adding a couple of comments (as suggested) so that the next person who has to trace through this code is grateful rather than mad at you ;-)
Sorry, something went wrong.
|
Perfect! Ship it |
Sorry, something went wrong.
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
|
Thank you for your help Alex, Hugo and Steve! |
Sorry, something went wrong.
|
Hey, if it interests anyone, I have a follow-up PR that simplifies a bunch of the code added in this PR. It does this by adding a new seps parameter to |
Sorry, something went wrong.
PurePath.match()now handles the**wildcard as inPath.glob(), i.e. it matches any number of path segments.We now compile a
re.Patternobject for the entire pattern. This is made more difficult byfnmatchnot treating directory separators as special when evaluating wildcards (*,?, etc), and so we arrange the path parts onto separate lines in a string, and ensure we don't setre.DOTALL.This improves performance of
match()around 2x-3x times for simple patterns, and more for complex patterns:$ ./python -m timeit \ -s 'from pathlib import PureWindowsPath as P; path = P("C:/foo/bar.py"); pattern = P("c:/*/*.py")' \ 'path.match(pattern)' 50000 loops, best of 5: 8.13 usec per loop # before 1000000 loops, best of 5: 297 nsec per loop # after