This looks quite impressive, so sorry for immediately jumping in with
criticism. -- I've benchmarked the things I worked on, and I can't see
any speedups but some significant slowdowns. This is on 64-bit Linux
with a Core 2 Duo, both versions compiled with just `./configure && make`:
Modules/_decimal/tests/bench.py:
--------------------------------
Not much change for floats and decimal.py, 8-10% slowdown for _decimal!
Telco benchmark [1]:
--------------------
4% slowdown.
Memoryview:
-----------
./python -m timeit -n 10000000 -s "x = memoryview(bytearray(b'x'*10000))" "x[:100]"
17% (!) slowdown.
Did I perhaps miss some option to turn on the optimizations?
[1] http://www.bytereef.org/mpdecimal/quickstart.html#telco-benchmark