bpo-45565: Specialize LOAD_ATTR_CLASS by Fidget-Spinner · Pull Request #29146 · python/cpython
Do you have stats for the standard benchmark suite (or something of similar scale), and have you benchmarked this?
No, I'll get some soon (pyperformance is a pain on Windows :().
- For the stats you give, the hits increase by ~10k and the misses increase by ~2k. The cost of a miss can be higher than the benefit of a hit so this might not be a win. Is this an artifact of the test scripts? Would you expect better numbers on "real" programs?
The only possible way for deopt is for tp_version_tag to change, and that requires the class variable to be written to. So things like:
Unfortunately, I have no clue how common something like this is in the real world. An alternative approach (with far fewer invalidations):
- Store owner.tp_mro tuple ID.
- Store the index of the real type we need to look into and where it belongs in owner.tp_mro.
- Store dict hint of real
type.__dict__and dk version..
At runtime, look into owner.tp_mro[mro_index].__dict__ + hint to get our attribute.
The benefit is that tp_version_tag invalidates every time a write occurs, but we don't care about that since the actual index in the dict doesn't change.
- The cache use for
LOAD_ATTRhas increased from 2 to 3. Only 4 bytes of each of the first two cache entries are being used, so this could be reduced back to 2 entries.
Indeed, I failed to see that _PyAdaptiveEntry still had 4 bytes of unused space, so we can pack tp_version_tag into there if we go with the old approach.