[3.13] gh-149079: Fix O(n^2) canonical ordering in unicodedata.normalize() (GH-149080) by encukou · Pull Request #150780 · python/cpython

bedevere-app

…normalize() (pythonGH-149080)

Replace the insertion sort used for canonical ordering of combining
characters with a hybrid approach: insertion sort for short runs (< 20)
and counting sort for longer runs, reducing worst-case complexity from
O(n^2) to O(n). This prevents denial of service via crafted Unicode
strings with many combining characters in alternating CCC order.
(cherry picked from commit 991224b)

Co-authored-by: Seth Larson <seth@python.org>
Co-authored-by: ch4n3-yoon <ch4n3.yoon@gmail.com>
Co-authored-by: Seokchan Yoon <13852925+ch4n3-yoon@users.noreply.github.com>
Co-authored-by: Stan Ulbrych <stan@python.org>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Maurycy Pawłowski-Wieroński <maurycy@maurycy.com>

Bot mentioned this pull request

Jun 2, 2026

encukou deleted the backport-991224b-3.13 branch

June 4, 2026 11:14