gh-101178: Add Ascii85, Base85, and Z85 support to binascii#102753
gh-101178: Add Ascii85, Base85, and Z85 support to binascii#102753serhiy-storchaka merged 45 commits into
Conversation
Sorry, something went wrong.
|
It's a year later, and Z85 support has been added to For reference, this is the benchmark run that led me to do so. # After merging main but before adding Z85 support to this PR
(cpython-b85) $ python bench_b85.py 64
b64encode : 67108864 b in 0.121 s (527.435 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.309 s (276.188 MB/s) using 56.889 MB
a85encode : 67108864 b in 0.297 s (215.150 MB/s) using 0.000 MB
a85decode : 83886080 b in 0.205 s (390.751 MB/s) using 0.000 MB
b85encode : 67108864 b in 0.106 s (604.359 MB/s) using 0.000 MB
b85decode : 83886080 b in 0.204 s (393.040 MB/s) using 0.000 MB
z85encode : 67108864 b in 0.204 s (313.610 MB/s) using 80.000 MB
z85decode : 83886080 b in 0.300 s (266.670 MB/s) using 100.000 MBThe existing Z85 implementation translates from the standard base85 alphabet to Z85 after the fact and within Python, so it was already benefiting from this PR but with substantial performance and memory usage overhead. That overhead is now gone. |
Sorry, something went wrong.
71f1955 to
7b4aba1
Compare
March 19, 2024 09:27
Sorry, something went wrong.
Add Ascii85, base85, and Z85 encoders and decoders to `binascii`, replacing the existing pure Python implementations in `base64`. No API or documentation changes are necessary with respect to `base64.a85encode()`, `b85encode()`, etc., and all existing unit tests for those functions continue to pass without modification. Note that attempting to decode Ascii85 or base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Resolves: pythongh-101178
7b4aba1 to
05ae5ad
Compare
April 21, 2025 05:16
|
PR has been rebased onto main at 78cfee6 with squashing. |
Sorry, something went wrong.
I believe you have to document this change. |
Sorry, something went wrong.
Fair point, I could do that. In case anyone argues for keeping the old behavior (silently ignoring length 1 mod 5), I won't do it just yet. |
Sorry, something went wrong.
If we were strictly following PEP-0399, _base64 would be a C module for accelerated functions in base64. Due to historical reasons, those should actually go in binascii instead. We still want to preserve the existing Python code in base64. Parting out facilities for accessing the C functions into a module named _base64 shouldn't risk a naming conflict and will simplify testing.
This is done differently to PEP-0399 to minimize the number of changed lines.
As we're now keeping the existing Python base 85 functions, the C implementations should behave exactly the same, down to exception type and wording. It is also no longer an error to try to decode data of length 1 mod 5.
|
The PR has been updated to preserve the existing base 85 Python functions in |
Sorry, something went wrong.
Importing update_wrapper() from functools to copy attributes is expensive. Do it ourselves for only the most relevant ones.
This requires some code duplication, but oh well.
Using a decorator complicates function signature introspection.
Do we really need to test the legacy API twice?
Include an integer overflow check for Ascii85.
Performance gains of up to 8% for a2b_ascii85() and 25% for a2b_base85() and a2b_z85() were observed.
|
The PR has been updated to address most reviewer comments. I even found some minor decoding performance gains. A bit more polish and this one will be done! |
Sorry, something went wrong.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Thank you for your update @kangtastic. I added support of ignorechars in the Base64 decoder, so now the similar code can be used in the Ascii85 decoder. Also note the changes in the Ascii85 documentation.
- Use the same names for parameters in
binasciiandbase64. - Remove unneeded parameters.
- Reuse the
ignorechar()function forignorechars. - Update the Ascii85 documentation -- refer to "bytes-like object" for "ignorechars". We can now guarantee this.
Sorry, something went wrong.
|
@serhiy-storchaka, sorry I was busy and didn't get to your feedback this past week. Looks good. One last thing from me - I noticed you changed the clinic input for the |
Sorry, something went wrong.
|
No problem, I pushed my changes because there are several other issues wait for this change (Base32 codec,
There is also a bug in the |
Sorry, something went wrong.
45d4a34
into
python:main
Feb 6, 2026
|
Thanks for the explanation about clinic input. And thanks again for accepting my PR and making the process pleasant. The timeline is indeed unfortunate but also isn't anyone's fault. |
Sorry, something went wrong.
…thonGH-102753) Add Ascii85, Base85, and Z85 encoders and decoders to binascii, replacing the existing pure Python implementations in base64. This makes the codecs two orders of magnitude faster and consume two orders of magnitude less memory. Note that attempting to decode Ascii85 or Base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
@kangtastic, are you also interested in implementing the Base32 codec in C? This will not only improve its performance and memory usage, but also make it easier to add more features in the future. I'm going to do it eventually, but I'm short on time right now. If you're interested, I promise a quick review. |
Sorry, something went wrong.
|
@serhiy-storchaka, I'm also short on time for the next ~2 weeks, but I might be interested after that. I'll ping you in this issue when I know for sure. |
Sorry, something went wrong.
|
@serhiy-storchaka, I'm interested in writing that Base32 implementation after all. Let me know if it's still needed/if you're looking for anything specific. |
Sorry, something went wrong.
|
It would be great! I asked you because I see that you are quite capable of this, and it might also be interesting for you. As for anything specific, I suggest to keep the code for |
Sorry, something went wrong.
|
@serhiy-storchaka, OK, I'll get started and do it as you suggest. |
Sorry, something went wrong.
|
See also #145980. It would be enough to only implement a single pair of functions for Base32. |
Sorry, something went wrong.
|
@serhiy-storchaka, did you want me to wait for #145981 to be finalized and merged before I create an issue and a PR for the base32 accelerator? It's almost finished except for testing, etc. |
Sorry, something went wrong.
…thonGH-102753) Add Ascii85, Base85, and Z85 encoders and decoders to binascii, replacing the existing pure Python implementations in base64. This makes the codecs two orders of magnitude faster and consume two orders of magnitude less memory. Note that attempting to decode Ascii85 or Base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Synopsis
Add Ascii85, Base85, and Z85 encoder and decoder functions implemented in C to
binasciiand use them to greatly improve the performance and reduce the memory usage of the existing Ascii85, Base85, and Z85 codec functions inbase64.No API or documentation changes are necessary with respect to any functions in
base64, and all existing unit tests for those functions continue to pass without modification.Resolves: gh-101178
Discussion
The base85-related functions in
base64are now wrappers for the new functions inbinascii, as envisioned in the docs:Parting out Ascii85 from Base85 and Z85 was warranted in my testing despite the code duplication due to the various performance-murdering special cases in Ascii85.
Comments and questions are welcome.
Benchmarks
Updated December 28, 2025.
The old pure-Python implementation is two orders of magnitude slower and uses over O(40n) temporary memory.