GH-135904: Improve the JIT's performance on macOS by brandtbucher · Pull Request #136528 · python/cpython
-
-
Notifications
You must be signed in to change notification settings - Fork 34.7k
Conversation
This PR makes a couple of minor tweaks to the JIT that result in 1.7% faster performance on macOS overall:
- Our AArch64 code doesn't need to be 8-byte aligned, just the data. Currently, we guarantee this by aligning all code anyways, since the data follows immediately after it. This is wasteful, since it means about half of all stencils end in a
nop. Instead, don't pad any stencils, and just align the data when it's compiled. 🤦🏼 - The textual assembly "optimizer" pass has a bug where it interprets lines that are commented with
;as instructions. By recognizing these commented lines, we can remove more zero-length jumps at the end of stencils. 🤦🏼 - During this same pass, we can represent the address of the next instruction (the end of the template, or the
_JIT_CONTINUElabel) as a "local" label, which allows the assembler to resolve it at compile time and encode it more efficiently. There's a special (platform-dependent) prefix to signal this. - Finally, instead of declaring jump targets (
_JIT_CONTINUE,_JIT_ERROR_TARGET, and_JIT_JUMP_TARGET) asexternsymbols, just declare them as local functions. This results in more efficient jumps (and also allows us to remove a somewhat hacky pre-processing step for the textual assembly on Windows to force these efficient jumps).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. The PR looks great, thanks for making the JIT so much clearer to reason about.
Agent-Hellboy pushed a commit to Agent-Hellboy/cpython that referenced this pull request
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters