{{ message }}
gh-146192: Add base32 support to binascii#146193
Merged
serhiy-storchaka merged 8 commits intoMar 22, 2026
Merged
Conversation
Add base32 encoder and decoder functions implemented in C to `binascii` and use them to greatly improve the performance and reduce the memory usage of the existing base32 codec functions in `base64`. No API or documentation changes are necessary with respect to any functions in `base64`, and all existing unit tests for those functions continue to pass without modification. Resolves: pythongh-146192
- Use the new `alphabet` parameter in `binascii` - Remove `binascii.a2b_base32hex()` and `binascii.b2a_base32hex()` - Change value for `.. versionadded::` ReST directive in docs for new `binascii` functions to "next" instead of "3.15"
db96a3f to
bf1308f
Compare
March 20, 2026 16:01
serhiy-storchaka
left a comment
Member
There was a problem hiding this comment.
I added some suggestions, but the core LGTM.
Please add assertions for new alphabets in test_constants.
Sorry, something went wrong.
- Update docs to refer to "Base 32" and "Base32" - Update docs to better explain `binascii.a2b_base32()` - Inline helper function in `base64` - Add forgotten tests for presence of alphabet module globals
gpshead
reviewed
Mar 22, 2026
serhiy-storchaka
left a comment
Member
There was a problem hiding this comment.
Please add also the What's New entry.
Sorry, something went wrong.
- Revise docs - Add whatsnew entry - Minor whitespace change in tests
Referring to a group of 8 bytes as an "octet" may cause confusion, because the term is already commonly used in some languages to refer to a group of 8 bits (i.e. a byte). "Octa" is a suitable preexisting alternative for a group of 64 bits [1] (used by Knuth himself, at that). "Octad" was considered, but it, too, historically refers to a byte. Also rename "quintet" to "quint". "Pentad" was considered, but it historically refers to a group of 5 bits. [1] https://en.wikipedia.org/wiki/Units_of_information
serhiy-storchaka
approved these changes
Mar 22, 2026
serhiy-storchaka
left a comment
Member
There was a problem hiding this comment.
LGTM. 👍
Sorry, something went wrong.
- Reword NEWS.d entry to "Base32" instead of "base-32". No prior entries have ever mentioned "base-64", etc., but they have mentioned "Base64", etc., so this is more consistent. - Reword whatsnew entry to "Base32" instead of "Base 32". No prior entries have ever mentioned "Base 64", etc., and there is an entry a little further up mentioning "Ascii85, Base85, and Z85", so this is more consistent. - Add a whatsnew entry in Optimizations > base64 & binascii section. - Whitespace change in `binascii.c`.
When decoding invalid length (1, 3 or 6 mod 8) + no padding, mention the invalid length instead of the improper padding in the exception message to match what the base64 decoder does. Additionally, move the logic for setting the exception message (back) outside the "slow path" loop; if we do end up checking canonicity of decoder input, it will feel (subjectively) better to have several checks grouped together after the loop.
serhiy-storchaka
approved these changes
Mar 22, 2026
gpshead
approved these changes
Mar 22, 2026
gpshead
left a comment
Member
There was a problem hiding this comment.
nice work!
Sorry, something went wrong.
Hide details
View details
serhiy-storchaka
merged commit
b4e5bc2
into
python:main
Mar 22, 2026
50 of 51 checks passed
Contributor
Author
|
@serhiy-storchaka, @gpshead, thanks for the quick review! Doing more of this sort of thing might be fun. Stay safe out there. |
Sorry, something went wrong.
CuriousLearner
added a commit
to CuriousLearner/cpython
that referenced
this pull request
Mar 23, 2026
…8577 * 'main' of github.com:python/cpython: pythongh-146197: Run -m test.pythoninfo on the Emscripten CI (python#146332) pythongh-146325: Use `test.support.requires_fork` in test_fastpath_cache_cleared_in_forked_child (python#146330) pythongh-146197: Add Emscripten to CI (python#146198) pythongh-143387: Raise an exception instead of returning None when metadata file is missing. (python#146234) pythongh-108907: ctypes: Document _type_ codes (pythonGH-145837) pythongh-146175: Soft-deprecate outdated macros; convert internal usage (pythonGH-146178) pythongh-146056: Rework ref counting in treebuilder_handle_end() (python#146167) Add a warning about untrusted input to `configparser` docs (python#146276) pythongh-145264: Do not ignore excess Base64 data after the first padded quad (pythonGH-145267) pythongh-146308: Fix error handling issues in _remote_debugging module (python#146309) pythongh-146192: Add base32 support to binascii (pythonGH-146193) pythongh-135953: Properly obtain main thread identifier in Gecko Collector (python#146045) pythongh-143414: Implement unique reference tracking for JIT, optimize unpacking of such tuples (pythonGH-144300) pythongh-146261: Fix bug in `_Py_uop_sym_set_func_version` (pythonGH-146291) pythongh-145144: Add more tests for UserList, UserDict, etc (pythonGH-145145) pythongh-143959: Fix test_datetime if _datetime is unavailable (pythonGH-145248) pythongh-146245: Fix reference and buffer leaks via audit hook in socket module (pythonGH-146248) pythongh-140049: Colorize exception notes in `traceback.py` (python#140051) Update docs for pythongh-146056 (pythonGH-146213)
ljfp
pushed a commit
to ljfp/cpython
that referenced
this pull request
Apr 25, 2026
Add base32 encoder and decoder functions implemented in C to the binascii module and use them to greatly improve the performance and reduce the memory usage of the existing base32 codec functions in the base64 module.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.
Synopsis
Add base32 encoder and decoder functions implemented in C to
binasciiand use them to greatly improve the performance and reduce the memory usage of the existing base32 codec functions inbase64.No API or documentation changes are necessary with respect to any functions in
base64, and all existing unit tests for those functions continue to pass without modification.Resolves: gh-146192
Discussion
The base32-related functions in
base64are now wrappers for the new functions inbinascii, as envisioned in the docs:Comments and questions are welcome.
Benchmarks
Benchmark script
Unmodified mainline CPython
With this PR
Encoding performance is improved by ~150x, decoding performance is improved by ~200x,
and no auxiliary memory is used.
📚 Documentation preview 📚: https://cpython-previews--146193.org.readthedocs.build/