The fatest patch (inline2.patch) has a negligible impact on benchmarks. The purpose of an optimization is to make Python faster, it's not the case here, so I close the issue.
Using timeit, the largest speedup is 1.29x faster. Using performance, spectral_norm is 1.07x faster and pybench.SimpleLongArithmetic is 1.06x faster. I consider that spectral_norm and pybench.SimpleLongArithmetic are microbenchmarks and so not representative of a real application.
The issue was fun, thank you for playing with me the game of micro-optimization ;-) Let's move to more interesting optimizations having a larger impact on more realistic workloads, like cache global variables, optimizing method calls, fastcalls, etc.