gh-83714: Use statx on Linux 4.11 and later in os.stat#136334
gh-83714: Use statx on Linux 4.11 and later in os.stat#136334jbosboom wants to merge 19 commits into
Conversation
The libc statx wrapper is required at build time, but weakly linked, and posix_do_stat will fall back to calling stat if statx is not available at runtime due to an old libc or an old kernel.
Sorry, something went wrong.
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Sorry, something went wrong.
|
Please update the title to use the gh issue number. |
Sorry, something went wrong.
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Sorry, something went wrong.
Co-authored-by: Erin of Yukis <erin-dev@ninetailed.ninja>
cmaloney
left a comment
There was a problem hiding this comment.
Some docs suggestions / CPython sphinx has special tags for a couple of the things; haven't looked at main implementation a lot yet
Sorry, something went wrong.
|
!buildbot .Android. |
Sorry, something went wrong.
|
🤖 New build scheduled with the buildbot fleet by @gpshead for commit b0e8276 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136334%2Fmerge The command will test the builders whose names match following regular expression: The builders matched are:
|
Sorry, something went wrong.
|
FYI, we recently added Android to the GitHub Actions CI, so there's no longer much need to invoke the buildbots manually. The exception is if a PR has something which might be specific to Android aarch64, because GitHub Actions can't run the tests on that platform yet, although it will still perform a complete build. |
Sorry, something went wrong.
|
Looking at the failures, the Android buildbots didn't run the most up-to-date code from this PR. Looking at the "Build Properties" tab, the "revision" is "b0e827681f45588ce67cd0924b3c320bb64351ff" (the latest), but "got_revision" is "a387390693c4f4ff793717a91a5137b7f608f9e4", a merge commit attributed to me but that I don't remember making of d53650a (the commit just before I added the configure check) with main. |
Sorry, something went wrong.
|
Both the buildbots and GitHub Actions don't actually test the head of the PR branch, but a speculative commit showing what you would get if you merged the PR into main. Because of a merge conflict, it's not possible to merge the PR into main at the moment. It looks like GitHub Actions deals with this by not running any tests at all, while the buildbots test the last commit before the conflict. |
Sorry, something went wrong.
|
I dislike this approach: modifying Python os.stat() to use statx. I would prefer to expose C statx() as Python os.statx() instead. |
Sorry, something went wrong.
… raw kwarg We no longer return None for invalid attributes for the benefit of drop-in replacement of os.stat when not all basic stats are required or sync=False is acceptable. Such code was already accepting faked values from os.stat, so can also accept faked values from os.statx. Remove the raw kwarg as it would have no effect.
These members were removed from os.stat_result in a previous commit.
|
I've marked this ready for review. @vstinner I'd also appreciate a review from you; GitHub isn't showing me the interface to formally request one. This PR has changed quite a bit from the first commit, and as it did so, I wrote quite a few words in the issue #83714 that may be informative (probably best to read from the bottom up). |
Sorry, something went wrong.
cmaloney
left a comment
There was a problem hiding this comment.
Reviewed the pieces I have reasonable knowledge of; don't have enough experience with macros and defining new structs to go in more detail through those pieces. For first contributions to CPython really impressive in getting things to the CPython style (docs, C code style, etc)
High level:
- This PR is large for CPython; Smaller chunks will make it easier to review and land; My thought would be adding one part adding os.statx; second updating os.stat to use statx when available
- The
os.statchange will need (on newer + older kernels). Just adding statx doesn't need as much performance validationpyperfmicrobenchmark that shows stat didn't get slower- Validate that Python startup time does not change significantly (I believe github.com/python/pyperformance has two benchmarks around that)
statxwill probably make sense to add topathlib.Pathfor easy access likestatis (but that should probably be another PR with the same issue)
With the thoroughness you're doing definitely feels like will get to the confidence to change os.stat to point to os.statx. Again; really impressive for a first CPython contributions.
Sorry, something went wrong.
I don't recommend this. |
Sorry, something went wrong.
|
👍 / agree with you, meant to say |
Sorry, something went wrong.
glibc doesn't remember that statx previously failed with ENOSYS, so on old kernels, glibc's emulation of statx via stat has the overhead of a failed syscall. Only call the statx wrapper under the same conditions that we make os.statx available. statx_works was always a global C variable, but now its scope is the file instead of just posix_do_stat. Whether statx works is a process-global property, so using a global C variable is appropriate. Previously access to statx_works was unsynchronized, following the pattern of dup3_works. Now we rely on posixmodule_exec happening-before any thread can call os.stat.
Also two minor fixes to statx tests in test_os.py.
|
Some benchmark numbers comparing
The 'PR nobtime' row replaces the last I don't have a VM running a pre-statx kernel handy, but I did find the The following numbers are from under
We're maybe slightly slower, and this time not due to an extra float and int, but if you look at the histograms in the gist, there are more outliers here. It's hard to conclude much. pyperformance startup benchmark, main vs this PR: main vs PR nobtime: PR nobtime vs PR: This PR doesn't change any of the code in fileutils.c to use statx, so if we're slower, it's because of Python main vs this PR running under |
Sorry, something went wrong.
|
As I wrote previously, I would prefer to leave I support the idea of adding a new I suggest creating a new PR which only adds a new |
Sorry, something went wrong.
Hmm. In 2024 a user asked for an update on the issue and you replied that ntninja's PR needed to be updated. That PR changed
That's a non sequitur. The reason I am pushing so hard for
And this PR does in fact add a |
Sorry, something went wrong.
|
Victor and I synced on the above at our ongoing core team sprint yesterday, I'm in agreement: Lets start with only While that could be done in this PR you may find it cleaner for review purposes to make a new PR focused on just that.
I really appreciate that you're focusing on the performance of this. I'm interested to see what you find! Long term, we envision there becoming a higher level API than the os module as a path stat interface. It could use os.stat, os.statx, and other APIs under the hood. But os.stat itself should not become such an API. I realize this is proposing that we ultimately create a new higher level API for stat. But things like We should track a new higher level API with its own issue. That may wind up becoming a PEP-style discussion as it'd be a new high level API - to be determined. |
Sorry, something went wrong.
|
Sorry, something went wrong.
|
@gpshead @vstinner While I agree pathlib should provide a nicer interface for stat, you are more optimistic than I am as to how soon that nicer interface will arrive. I guess as core team members you have more control over when it happens. :) I created a new PR #139178. |
Sorry, something went wrong.
|
Pathlib updates for stat attributes are currently being discussed -- https://discuss.python.org/t/expose-file-size-mode-etc-from-pathlib-path-info/103828 questions/criticism/ideas welcome :). |
Sorry, something went wrong.
I've posted some questions in gh-83714. For now, this PR should be considered earnest effort (in the sense of earnest money). It will change based on the answers to the questions in the issue. It needs tests for the new features and the fallback that are not obvious to implement. I need guidance as to when it should strictly conform to PEP 7 and when it should match the surrounding code style (especially with regards to brace placement). But it should demonstrate I'm willing and able.
The libc statx wrapper is required at build time, but weakly linked. There is no configure test; we just look for
STATX_BASIC_STATSbeing defined bysys/stat.h.posix_do_statwill fall back to calling stat if statx is not available at runtime due to an old libc or an old kernel. The fallback variablestatx_worksis based ondup3_works, for the thread-safety of which see @ericsnowcurrently's comment. The values are (to me) nonobvious: -1 means statx has never failed, 0 means its first failure was ENOSYS and 1 means its first failure was something other than ENOSYS.This PR is not directly based on gh-19125, but was greatly informed by it, and I'd be happy to give Co-authored-by and/or blurb credit to @ntninja.
📚 Documentation preview 📚: https://cpython-previews--136334.org.readthedocs.build/