Thank you Demur. I left few comments on Rietveld (this is only the quick glance).
Good news that this change allows to simplify BUILD_MAP_UNPACK_WITH_CALL. But using 15th bit means using EXTENDED_ARG. I think it would be simpler to just push an empty tuple with LOAD_CONST. The same number of opcodes, but simpler implementation and nicer disassemble output.
Mark's idea about using oparg of the general variant for specifying a number of positional arguments without packing them in a tuple looks interesting. But I don't bother about this case. According to the statistics of running CPython tests *args or *kwargs are used only in 0.4% of calls. This is hardly worth optimizing. I'm going to collect more detailed statistics for number of args and function/method distinguishing.
As about PyFunction_CallSimple() etc, see issue26814 and issue27128. _PyObject_FastCall() protocol and new calling protocol should cover the case of new CALL_FUNCTION and may be CALL_FUNCTION_KW (>99% of calls in sum).