> Indeed, that seems to be much faster. In the mean time, do you mind if I
> steal the code? :-)
Not at all!
I guess constant factors could well appear and/or disappear when
recoding in C; it might well be worth trying to implement both
methods to see which is faster.
There are some semi-obvious tricks available that might speed up the
recursive version. For one thing, all shifts should be in multiples
of 15 bits (because longs are stored in base 2**15). For another, it
ought to be possible to avoid the single-bit normalization shifts
every time the number of bits ('n') is odd---one can do a single
shift at the beginning of the calculation instead.