Three files:
(1) bit_length7.patch just does the numbits->bit_length renaming;
otherwise it's the same as numbits-6b.patch.
(2) bit_length7_opt.patch uses the fast bitcount method that Raymond
pointed out.
(3) bit_length_pybench.patch adds a test for bit_length to pybench.
On my system (OS X 10.5.5/Core 2 Duo), on a 32-bit non-debug build of
the trunk, pybench is showing me a 4-5% speedup for the optimized
version.
(I also tried a version that uses gcc's __builtin_clzl function; this
was around 10% faster than the basic version, but of course it's not
portable.)