◐ Shell
reader mode source ↗
Skip to content

gh-131253: free-threaded build support for pystats#137189

Merged
nascheme merged 18 commits into
python:mainfrom
nascheme:gh-131253-pystats-ft
Nov 3, 2025
Merged

gh-131253: free-threaded build support for pystats#137189
nascheme merged 18 commits into
python:mainfrom
nascheme:gh-131253-pystats-ft

Conversation

@nascheme

@nascheme nascheme commented Jul 28, 2025

Copy link
Copy Markdown
Member

Allow the --enable-pystats build option to be used with free-threading. For the free-threaded builds, the stats structure is allocated per-thread and then periodically merged into a per-interpter stats structure (on thread exit or when the reporting function is called).

Summary of changes:

  • Add pystats to the PyThreadState structure. This is a pointer to the PyStats structure if recording of stats is enabled. This is used for both the free-threaded and default builds.

  • For the free-threaded build, the _PyThreadStateImpl structure gains a pystats_struct member. This is allocated per-thread when recording of pystats is first enabled.

  • replace _Py_stats references with _PyStats_GET()

  • move pystats logic from Python/specialize.c into Python/pystats.c

  • move the pystats global state into the interpreter structure

  • add some free-threaded specific stat counters

Notes and potential issues:

  • The FTStats counts will need review. I wasn't sure about the naming or even how useful these might be. For mutex_sleeps, we need to determine what is the most useful thing to measure. I increment that count whenever we have to "spin" on a mutex.

  • The verbose code for print_* and merge_* is perhaps not ideal. I considered making this data driven instead. However, I think that actually would add complexity, not reduce it.

  • I plan on adding two levels of pystats recording: default and extended. The "default" level would be enabled by default (or at least could be) and would only include stats that are relatively cheap to record. The "extended" level would be like what the current --enable-pystats does (counting things like INCREF/DECREF, which is expensive).

Allow the --enable-pystats build option to be used with
free-threading.  For the free-threaded builds, the stats structure is
allocated per-thread and then periodically merged into a global stats
structure (on thread exit or when the reporting function is called).

Summary of changes:

* introduce _Py_tss_stats thread-local variable.  This is set
  when stats are on, replacing the _Py_stats global that's used
  in the non-free-threaded build.

* replace _Py_stats references with _PyStats_GET()

* move pystats logic from Python/specialize.c into Python/pystats.c

* add some free-threaded specific stat counters
@nascheme nascheme requested a review from colesbury July 30, 2025 19:17

@colesbury colesbury left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

The management of the thread local variable seems complex. Can we keep the thread-local stats on PyThreadState and have _PyStats_GET() return something like _PyThreadState_GET()->stats? How much would the extra indirection slow down the stats build?

@nascheme

Copy link
Copy Markdown
Member Author

The management of the thread local variable seems complex. Can we keep the thread-local stats on PyThreadState and have _PyStats_GET() return something like _PyThreadState_GET()->stats? How much would the extra indirection slow down the stats build?

Yeah, it's maybe overly complex. I don't think there would be much performance difference since tstate is already used so heavily. Is there a problem with adding a pystats entry to the public PyThreadState structure?

@colesbury

Copy link
Copy Markdown
Contributor

If you are considered about exposing it publicly, you can add to it _PyThreadStateImpl. Every PyThreadState* is actually a _PyThreadStateImpl and the extra fields are places after the publicly visible PyThreadState fields.

Need to do a merge before reporting (I lost that bit of code on a
re-factor).

Fix various issues with data races.  When merging from all threads, we
need to stop-the-world to avoid races.  When toggling on or off, also
need to stop-the-world.  Remove the need for locking for
_PyStats_Attach().
@nascheme

Copy link
Copy Markdown
Member Author

I don't think there would be much performance difference since tstate is already used so heavily.

After some profiling, it seems that having the pystats pointer in tstate, rather than it's own thread-local adds about 5% extra overhead. Obviously that would depend on a lot of factors. However, given that the full pystats recording in very heavyweight (increments stat counts on incref/decref, for example), I think that's a totally acceptable performance cost in exchange for some simpler code.

Add pystats pointer to PyThreadState and use from there instead.  This
is slightly slower but shouldn't matter in practice.  This simplifies
the attach/detach logic as well.
@nascheme nascheme force-pushed the gh-131253-pystats-ft branch from 3f58012 to 488f0be Compare August 6, 2025 23:12
@markshannon

Copy link
Copy Markdown
Member

Looks good in general. Moving the stats to its own file makes a lot of sense.

How have you tested this? We used to be able to use the Microsoft test runner to show the diff in stats, but that's not available any more.

@nascheme

Copy link
Copy Markdown
Member Author

How have you tested this? We used to be able to use the Microsoft test runner to show the diff in stats, but that's not available any more.

I ran various scripts with pystats enabled, then compared the output (with and without this PR), ignoring differences in the counts.

16 hidden items Load more…
Add the _PyStats_InterpInit() function which will set pystats_enabled
flag and allocate pystats_struct as required.  We shouldn't be putting
logic in _PyObject_InitState().  This also fixes a bug where
interp->pystats_struct was allocated repeated rather than once per
interpreter.
This is a bit more efficient, with the downside of a bit extra code.
For _PyStats_Clear(), we don't need to do the merging operation.  I
think the code is a bit more clear this way.
@nascheme nascheme requested a review from emmatyping as a code owner October 23, 2025 00:23

@sergey-miryanov sergey-miryanov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

LGTM

Need to allocate in stats_toggle_on_off() if the thread never had
stats enabled yet.  For zeroing, handle case that struct is not
allocated.  Remove bogus ts->_status.active check.
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
nascheme and others added 2 commits November 3, 2025 10:04
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Hide details View details @nascheme nascheme merged commit c98c5b3 into python:main Nov 3, 2025
52 checks passed
StanFromIreland pushed a commit to StanFromIreland/cpython that referenced this pull request Dec 6, 2025
…7189)

Allow the --enable-pystats build option to be used with free-threading.  The
stats are now stored on a per-interpreter basis, rather than process global.
For free-threaded builds, the stats structure is allocated per-thread and
then periodically merged into the per-interpreter stats structure (on thread
exit or when the reporting function is called). Most of the pystats related
code has be moved into the file Python/pystats.c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants