◐ Shell
clean mode source ↗

Message 176459 - Python tracker

There's actually enormous backtracking here.  Try this much shorter regexp and you'll see much the same behavior:

re_utf8 = r'^([\x00-\x7f]+)*$'

That's the original re_utf8 with all but the first alternative removed.

Looks like passing s[0:34] "works" because it eliminates the trailing \x8d that prevents the regexp from matching the whole string.  Because the regexp cannot match the whole string, it takes a very long time to try all the futile combinations implied by the nested quantifiers.  As the much simpler re_utf8 above shows, it's not the alternatives in the regexp that matter here, it's the nested quantifiers.