◐ Shell
clean mode source ↗

Message 171378 - Python tracker

Quoting Steven D'Aprano on c.l.p.

"But add a call to replace, and things are very different:

[steve@ando ~]$ python2.7 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 9.3 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 5.43 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 18.3 usec per loop


Three times slower, even for pure-ASCII strings. I get comparable results for Unicode. Notice how slow Python 2.7 is:

[steve@ando ~]$ python2.7 -m timeit -s "s = u'你'*1000" "s.replace(u'你', u'a')"
10000 loops, best of 3: 65.6 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
100000 loops, best of 3: 2.79 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
10000 loops, best of 3: 23.7 usec per loop

Even with the performance regression, it is still over twice as fast as
Python 2.7.

Nevertheless, I think there is something here. The consequences are nowhere near as dramatic as jmf claims, but it does seem that replace() has taken a serious performance hit. Perhaps it is unavoidable, but perhaps not.

If anyone else can confirm similar results, I think this should be raised as a performance regression.

Quoting Serhiy Storchaka in response.

"Yes, I confirm, it's a performance regression. It should be avoidable.
Almost any PEP393 performance regression can be avoided. At least for
such corner case. Just no one has yet optimized this case."