◐ Shell
reader mode source ↗
Skip to content

gh-106581: Split CALL_PY_EXACT_ARGS into uops#107760

Merged
gvanrossum merged 20 commits into
python:mainfrom
gvanrossum:call-uops
Aug 16, 2023
Merged

gh-106581: Split CALL_PY_EXACT_ARGS into uops#107760
gvanrossum merged 20 commits into
python:mainfrom
gvanrossum:call-uops

Conversation

@gvanrossum

@gvanrossum gvanrossum commented Aug 8, 2023

Copy link
Copy Markdown
Member

This is only the first step for doing CALL in Tier 2. The next step involves tracing into the called code object. After that we'll have to do the remaining CALL specialization. Finally we'll have to tweak various things like KW_NAMES, and possibly move the NULL (for method calls) above the callable (that's 107788). But those are things for future PRs.

Note: this moves setting frame->return_offset directly in front of DISPATCH_INLINED(), to make it easier to move it into _PUSH_FRAME.

@brandtbucher brandtbucher left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Thanks for tackling this, it definitely doesn't look easy. It's sort of a bummer that we need to special-case this much stuff, but I also don't see a nicer way of handling these issues than what you have here.

A few comments and questions, mostly for my own understanding:

@gvanrossum gvanrossum left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

I'll add that assert; then I'll review your PR, and hopefully you can then merge that, and I can handle the merge fallout.

@gvanrossum gvanrossum marked this pull request as draft August 9, 2023 17:41
@gvanrossum

Copy link
Copy Markdown
Member Author

Made this back into a draft; I need to (a) wait for Brandt's gh-107788, then (b) redo the split and tooling changes using Mark's ideas.

@brandtbucher

Copy link
Copy Markdown
Member

The CALL PR has been merged.

This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to tweak various things like `KW_NAMES`,
and possibly move the `NULL` (for method calls) *above* the callable.
But those are things for future PRs.

Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
13 hidden items Load more…
@ambv

ambv commented Aug 11, 2023

Copy link
Copy Markdown
Contributor

Closing and re-opening to retrigger CLA checks. Sorry for the noise.

@ambv ambv closed this Aug 11, 2023
@ambv ambv reopened this Aug 11, 2023
Instead, the special case is an opcode using SAVE_FRAME_STATE().
Introducing #if TIER_ONE and #if TIER_TWO so we can implement
_PUSH_FRAME differently for both tiers.
Instead, we special-case SAVE_IP:
- Its Tier 2 expansion sets oparg to the instruction offset
- In Tier 1 it is a no-op (and skipped if present in a macro)
@gvanrossum gvanrossum marked this pull request as ready for review August 13, 2023 03:31
@gvanrossum

Copy link
Copy Markdown
Member Author

@markshannon I was hoping you'd review this. I added _Py_EnterRecursivePy which was the last thing on my TODO list.

Unless you'd rather review #107925, which includes this (and #107793, which is the intermediate stage).

@markshannon markshannon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

I'm uneasy about the introduction of the TIER_ONE and TIER_TWO macros.
It is a principle of the overall design that there is a single source of truth for the semantics of bytecodes.

It might appear that I'm being dogmatic, but the need for something like those macros often indicates an underlying problem that should be fixed independently.

In this case the problem is the cframe. Loading and saving the IP needs to handled specially anyway and saving and loading the SP should be the same for both interpreters (but will need to be handled specially by the copy-and-patch compiler, so should be its own micro-op).
It is pushes the frame that differs. Removing cframe will fix that.

The cframe only exists as a performance hack to minimize the impact of tracing prior to PEP 669.

@gvanrossum

Copy link
Copy Markdown
Member Author

Benchmark: 1.00x faster: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20230816-3.13.0a0-05af848

IOW it doesn't slow CALL_PY_EXACT_ARGS down, which is all I care about.

@gvanrossum gvanrossum merged commit dc8fdf5 into python:main Aug 16, 2023
@gvanrossum gvanrossum deleted the call-uops branch August 16, 2023 23:31
@bedevere-bot

Copy link
Copy Markdown

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot wasm32-emscripten node (pthreads) 3.x has failed when building commit dc8fdf5.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/1050/builds/2796) and take a look at the build logs.
  4. Check if the failure is related to this commit (dc8fdf5) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/1050/builds/2796

Summary of the results of the build (if available):

== Tests result: ENV CHANGED ==

329 tests OK.

10 slowest tests:

  • test_math: 2 min 9 sec
  • test_hashlib: 1 min 59 sec
  • test_tarfile: 1 min 9 sec
  • test_unparse: 49.6 sec
  • test_io: 41.3 sec
  • test_tokenize: 40.0 sec
  • test_unicodedata: 28.3 sec
  • test_capi: 27.8 sec
  • test_fstring: 24.5 sec
  • test_pickle: 23.0 sec

1 test altered the execution environment:
test_capi

117 tests skipped:
test.test_asyncio.test_base_events
test.test_asyncio.test_buffered_proto
test.test_asyncio.test_context
test.test_asyncio.test_eager_task_factory
test.test_asyncio.test_events test.test_asyncio.test_futures
test.test_asyncio.test_futures2 test.test_asyncio.test_locks
test.test_asyncio.test_pep492
test.test_asyncio.test_proactor_events
test.test_asyncio.test_protocols test.test_asyncio.test_queues
test.test_asyncio.test_runners
test.test_asyncio.test_selector_events
test.test_asyncio.test_sendfile test.test_asyncio.test_server
test.test_asyncio.test_sock_lowlevel test.test_asyncio.test_ssl
test.test_asyncio.test_sslproto test.test_asyncio.test_streams
test.test_asyncio.test_subprocess
test.test_asyncio.test_taskgroups test.test_asyncio.test_tasks
test.test_asyncio.test_threads test.test_asyncio.test_timeouts
test.test_asyncio.test_transports
test.test_asyncio.test_unix_events test.test_asyncio.test_waitfor
test.test_asyncio.test_windows_events
test.test_asyncio.test_windows_utils test__xxinterpchannels
test__xxsubinterpreters test_asyncgen test_clinic test_cmd_line
test_concurrent_futures test_contextlib_async test_ctypes
test_curses test_dbm_gnu test_dbm_ndbm test_devpoll test_doctest
test_docxmlrpc test_dtrace test_embed test_epoll test_faulthandler
test_fcntl test_file_eintr test_fork1 test_ftplib test_gdb
test_generated_cases test_grp test_httplib test_httpservers
test_idle test_imaplib test_interpreters test_ioctl test_kqueue
test_launcher test_lzma test_mmap test_multiprocessing_fork
test_multiprocessing_forkserver test_multiprocessing_main_handling
test_multiprocessing_spawn test_openpty test_pdb
test_perf_profiler test_perfmaps test_poll test_poplib test_pty
test_pwd test_readline test_regrtest test_repl test_resource
test_select test_selectors test_smtplib test_smtpnet test_socket
test_socketserver test_ssl test_stable_abi_ctypes test_startfile
test_subprocess test_sys_settrace test_syslog test_tcl
test_tkinter test_tools test_ttk test_ttk_textonly test_turtle
test_urllib2 test_urllib2_localnet test_urllib2net test_urllibnet
test_venv test_wait3 test_wait4 test_webbrowser test_winconsoleio
test_winreg test_winsound test_wmi test_wsgiref test_xmlrpc
test_xxlimited test_zipfile64 test_zipimport_support test_zoneinfo

Total duration: 26 min 4 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/opt/buildbot/bcannon-wasm/3.x.bcannon-wasm.emscripten-node-pthreads/build/Lib/test/test_capi/test_watchers.py", line 532, in watcher
    raise MyError("testing 123")

gvanrossum added a commit that referenced this pull request Aug 17, 2023
This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

interpreter-core (Objects, Python, Grammar, and Parser dirs) skip news

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants