◐ Shell
clean mode source ↗

Message 287865 - Python tracker

> # builtin min and max doesn't support FASTCALL
> (...): 1.16x slower (+16%)

Hum, the slowdown is not negligible, even if functools.reduce() is rarely used.

Do you think that it would be insane to have two code paths instead?

* New FASTCALL path if func suports FASTCALL calling _PyObject_FastCall()
* Current code path using cached args tuple otherwise

For example, you can use args==NULL marker for the FASTCALL path. Maybe we need a _PyObject_SupportFastCall() function which would return 1 for Python functions and C functions, except of C functions with METH_VARARGS flag set.

My expectation is a speedup for functions supporting FASTCALL, but *no slowdown* for functions not supporting FASTCALL.

--

property_descr_get() also caches args tuple. I started my work on FASTCALL because this optimization caused bugs (but also because I wanted to make Python faster, but that's a different topic ;-)).

In the past, the _pickle module also used cached args tuple, but the cache was removed because it was vulnerable to race conditions.

For this issue, I suggest to leave functools.reduce() unchanged, but open a new issue to discuss what to do with code still using a cached args tuple.

My long term goal with FASTCALL was to remove completely cached tuple used to call functions. I wrote tp_fastcall for that, but tp_fastcall (issue #29259) was rejected. The rejected tp_fastcall blocked my long term plan, we have to find a different approach.

Maybe we should add support for FASTCALL for a little bit more functions, and later simply remove the optimization (hack, cached tuple)?