gh-91404: Improve `re` performance#91495

brandtbucher

This makes a few performance improvements in sre_lib.h:

Like the main VM, the regex engine now uses computed gotos on supported platforms.
Two read- and write-heavy members of the ctx pointer (ctx->pattern and ctx->ptr) are lifted into local variables.

(The diff looks gnarly, but everything inside the main switch is just mechanical replacements to support these two changes. It's not really that bad.)

It yields nice improvements on all of the expected benchmarks, and a 1% improvement overall:

Slower (6):
- scimark_monte_carlo: 398 ms +- 14 ms -> 406 ms +- 15 ms: 1.02x slower
- chaos: 407 ms +- 13 ms -> 413 ms +- 11 ms: 1.01x slower
- scimark_fft: 1.95 sec +- 0.02 sec -> 1.97 sec +- 0.02 sec: 1.01x slower
- tornado_http: 570 ms +- 17 ms -> 577 ms +- 17 ms: 1.01x slower
- sympy_expand: 2.77 sec +- 0.02 sec -> 2.79 sec +- 0.02 sec: 1.01x slower
- meteor_contest: 618 ms +- 11 ms -> 622 ms +- 12 ms: 1.01x slower

Faster (14):
- regex_v8: 158 ms +- 9 ms -> 132 ms +- 11 ms: 1.20x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 17.5 ms +- 0.9 ms: 1.10x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.24 sec +- 0.01 sec: 1.09x faster
- pycparser: 7.07 sec +- 0.10 sec -> 6.73 sec +- 0.10 sec: 1.05x faster
- pidigits: 1.20 sec +- 0.01 sec -> 1.16 sec +- 0.01 sec: 1.04x faster
- pickle_pure_python: 1.86 ms +- 0.13 ms -> 1.80 ms +- 0.13 ms: 1.03x faster
- scimark_sparse_mat_mult: 28.8 ms +- 1.5 ms -> 27.9 ms +- 1.5 ms: 1.03x faster
- fannkuch: 2.35 sec +- 0.03 sec -> 2.28 sec +- 0.02 sec: 1.03x faster
- float: 449 ms +- 13 ms -> 439 ms +- 11 ms: 1.02x faster
- xml_etree_iterparse: 625 ms +- 15 ms -> 613 ms +- 13 ms: 1.02x faster
- spectral_norm: 594 ms +- 11 ms -> 587 ms +- 13 ms: 1.01x faster
- xml_etree_parse: 943 ms +- 14 ms -> 934 ms +- 17 ms: 1.01x faster
- 2to3: 1.56 sec +- 0.01 sec -> 1.55 sec +- 0.01 sec: 1.01x faster
- sympy_sum: 946 ms +- 14 ms -> 938 ms +- 13 ms: 1.01x faster

Benchmark hidden because not significant (42): chameleon, crypto_pyaes, deltablue, django_template, dulwich_log, go, hexiom, html5lib, json, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, richards, scimark_lu, scimark_sor, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, sympy_integrate, sympy_str, telco, thrift, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

Maybe re won't be slower in 3.11 after all! 🙃

markshannon

Nice speedup.

I wonder how much of the speedup would be achieved by just moving ctx->pattern and ctx->ptr into local variables and leaving the dispatch alone. I'm wondering this for a couple of reasons:

This paper suggests that it is the memory read, not the branch prediction, that has the larger impact on performance
We are concerned about dispatching on Windows, and it might be good to apply the results of that here as well.

How easy would it be to apply the changes that move ctx->pattern and ctx->ptr, without the dispatching changes?

serhiy-storchaka

How easy would it be to apply the changes that move ctx->pattern and ctx->ptr, without the dispatching changes?

@brandtbucher, could you show results with disabled computed gotos? Only regex-related benchmarks are interesting.

brandtbucher

With only computed gotos:

- regex_v8: 158 ms +- 9 ms -> 144 ms +- 11 ms: 1.09x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 18.2 ms +- 0.9 ms: 1.05x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.30 sec +- 0.01 sec: 1.05x faster

With only new locals:

- regex_v8: 158 ms +- 9 ms -> 142 ms +- 12 ms: 1.11x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 18.2 ms +- 1.0 ms: 1.05x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.22 sec +- 0.01 sec: 1.11x faster

Combined:

- regex_v8: 158 ms +- 9 ms -> 132 ms +- 11 ms: 1.20x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 17.5 ms +- 0.9 ms: 1.10x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.24 sec +- 0.01 sec: 1.09x faster

For regex_v8 and regex_effbot, it looks like each optimization contributes about 50% of the speedup. regex_dna seems to benefit mostly from the new locals, for some reason (perhaps because the switch dispatch for that one is already highly stable/predicable?).

I think we should keep both.

serhiy-storchaka

Excellent! Before merging, could you compare it with 3.10? If there is the same difference, it would be worth to add a note about speed up 10-20% in the NEWS and What's New files.

This is a backport of the upstream 3.11 improvement: python/cpython#91495 I only backported the ctx->pattern -> pattern and ctx->ptr -> ptr part because using computed goto actually decreased perf slightly on the opt build.

brandtbucher added 9 commits April 12, 2022 10:16

Try out computed gotos in the regex engine

89e4e5e

Reduce indirection

bc378d8

More of that

fa98784

Rearrange some stuff

48a7f1c

Whitespace

6f8a9a0

Generate sre_targets.h

03d3194

Catch up with main

24fa5a1

blurb add

0d30328

fixup

fdcde9f

brandtbucher added performance stdlib Standard Library Python modules in the Lib/ directory topic-regex labels Apr 13, 2022

brandtbucher requested a review from serhiy-storchaka April 13, 2022 03:57

bedevere-bot added the awaiting core review label Apr 13, 2022

brandtbucher mentioned this pull request Apr 13, 2022

Possible slowdown of regex searching in 3.11 #91404

Closed

brandtbucher self-assigned this Apr 13, 2022

serhiy-storchaka reviewed Apr 14, 2022

View reviewed changes

What's New

d38743f

ghost mentioned this pull request Apr 15, 2022

gh-91524: Speed up the regular expression substitution #91525

Merged

serhiy-storchaka approved these changes Apr 15, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Apr 15, 2022

brandtbucher merged commit 1b34b56 into python:main Apr 15, 2022

bedevere-bot removed the awaiting merge label Apr 15, 2022

undingen mentioned this pull request Apr 26, 2022

re: speedup matching by storing some state in local variables pyston/pyston#205

Merged

serhiy-storchaka added the 3.11 only security fixes label May 19, 2022

brandtbucher deleted the sre-goto branch July 21, 2022 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-91404: Improve `re` performance#91495

gh-91404: Improve `re` performance#91495
brandtbucher merged 10 commits into
python:mainfrom
brandtbucher:sre-goto

brandtbucher commented Apr 13, 2022 •

edited

Loading

Uh oh!

markshannon commented Apr 13, 2022 •

edited

Loading

Uh oh!

serhiy-storchaka commented Apr 13, 2022

Uh oh!

brandtbucher commented Apr 13, 2022 •

edited

Loading

Uh oh!

serhiy-storchaka commented Apr 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

brandtbucher commented Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markshannon commented Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Apr 13, 2022

Uh oh!

brandtbucher commented Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Apr 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

brandtbucher commented Apr 13, 2022 •

edited

Loading

markshannon commented Apr 13, 2022 •

edited

Loading

brandtbucher commented Apr 13, 2022 •

edited

Loading