gh-101178: Add Ascii85, Base85, and Z85 support to binascii#102753

kangtastic

Synopsis

Add Ascii85, Base85, and Z85 encoder and decoder functions implemented in C to binascii and use them to greatly improve the performance and reduce the memory usage of the existing Ascii85, Base85, and Z85 codec functions in base64.

No API or documentation changes are necessary with respect to any functions in base64, and all existing unit tests for those functions continue to pass without modification.

Resolves: gh-101178

Discussion

The base85-related functions in base64 are now wrappers for the new functions in binascii, as envisioned in the docs:

The binascii module contains a number of methods to convert between binary and various ASCII-encoded binary representations. Normally, you will not use these functions directly but use wrapper modules like uu or base64 instead. The binascii module contains low-level functions written in C for greater speed that are used by the higher-level modules.

Parting out Ascii85 from Base85 and Z85 was warranted in my testing despite the code duplication due to the various performance-murdering special cases in Ascii85.

Comments and questions are welcome.

Benchmarks

Updated December 28, 2025.

# bench_b85.py

# Note: EXTREMELY SLOW on unmodified mainline CPython
#       when tracing malloc on the base-85 functions.

import base64
import sys
import timeit
import tracemalloc

funcs = [(base64.b64encode, base64.b64decode),  # sanity check/comparison
         (base64.a85encode, base64.a85decode),
         (base64.b85encode, base64.b85decode),
         (base64.z85encode, base64.z85decode)]

def mb(n):
    return f"{n / 1024 / 1024:.3f} MB"

def stats(func, data, t, m):
    name, n, bps = func.__qualname__, len(data), len(data) / t
    print(f"{name} : {n} b in {t:.3f} s ({mb(bps)}/s) using {mb(m)}")

if __name__ == "__main__":
    data = b"a" * int(sys.argv[1]) * 1024 * 1024
    for fenc, fdec in funcs:
        tracemalloc.start()
        enc = fenc(data)
        menc = tracemalloc.get_traced_memory()[1] - len(enc)
        tracemalloc.stop()
        tenc = timeit.timeit("fenc(data)", number=1, globals=globals())
        stats(fenc, data, tenc, menc)

        tracemalloc.start()
        dec = fenc(enc)
        mdec = tracemalloc.get_traced_memory()[1] - len(dec)
        tracemalloc.stop()
        tdec = timeit.timeit("fdec(enc)", number=1, globals=globals())
        stats(fdec, enc, tdec, mdec)

# Python 3.15.0a3+ (heads/main:0efbad60e13, Dec 28 2025, 11:02:16)
# ./configure --enable-optimizations --with-lto

# Unmodified
$ time ./python bench_b85.py 64
b64encode : 67108864 b in 0.092 s (693.266 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.234 s (364.961 MB/s) using 56.889 MB
a85encode : 67108864 b in 7.163 s (8.935 MB/s) using 2664.401 MB
a85decode : 83886080 b in 14.478 s (5.526 MB/s) using 3332.254 MB
b85encode : 67108864 b in 6.965 s (9.189 MB/s) using 2664.401 MB
b85decode : 83886080 b in 10.082 s (7.935 MB/s) using 3332.254 MB
z85encode : 67108864 b in 7.245 s (8.834 MB/s) using 2664.102 MB
z85decode : 83886080 b in 9.666 s (8.277 MB/s) using 3332.254 MB

real    9m44.382s
user    9m27.271s
sys     0m12.747s


# With this PR
b64encode : 67108864 b in 0.085 s (753.375 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.230 s (371.282 MB/s) using 56.889 MB
a85encode : 67108864 b in 0.094 s (681.709 MB/s) using 0.000 MB
a85decode : 83886080 b in 0.191 s (418.019 MB/s) using 0.000 MB
b85encode : 67108864 b in 0.075 s (850.118 MB/s) using 0.000 MB
b85decode : 83886080 b in 0.141 s (567.490 MB/s) using 0.000 MB
z85encode : 67108864 b in 0.074 s (864.559 MB/s) using 0.000 MB
z85decode : 83886080 b in 0.173 s (462.854 MB/s) using 0.000 MB

real    0m1.865s
user    0m1.726s
sys     0m0.126s

The old pure-Python implementation is two orders of magnitude slower and uses over O(40n) temporary memory.

ghost

All commit authors signed the Contributor License Agreement.

kangtastic

It's a year later, and Z85 support has been added to base64 in the meantime. So while bringing this PR up to date with main, I added Z85 support to it as well.

For reference, this is the benchmark run that led me to do so.

# After merging main but before adding Z85 support to this PR
(cpython-b85) $ python bench_b85.py 64
b64encode : 67108864 b in 0.121 s (527.435 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.309 s (276.188 MB/s) using 56.889 MB
a85encode : 67108864 b in 0.297 s (215.150 MB/s) using 0.000 MB
a85decode : 83886080 b in 0.205 s (390.751 MB/s) using 0.000 MB
b85encode : 67108864 b in 0.106 s (604.359 MB/s) using 0.000 MB
b85decode : 83886080 b in 0.204 s (393.040 MB/s) using 0.000 MB
z85encode : 67108864 b in 0.204 s (313.610 MB/s) using 80.000 MB
z85decode : 83886080 b in 0.300 s (266.670 MB/s) using 100.000 MB

The existing Z85 implementation translates from the standard base85 alphabet to Z85 after the fact and within Python, so it was already benefiting from this PR but with substantial performance and memory usage overhead. That overhead is now gone.

python-cla-bot

All commit authors signed the Contributor License Agreement.

Add Ascii85, base85, and Z85 encoders and decoders to `binascii`, replacing the existing pure Python implementations in `base64`. No API or documentation changes are necessary with respect to `base64.a85encode()`, `b85encode()`, etc., and all existing unit tests for those functions continue to pass without modification. Note that attempting to decode Ascii85 or base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Resolves: pythongh-101178

kangtastic

PR has been rebased onto main at 78cfee6 with squashing.

sergey-miryanov

Note that attempting to decode Ascii85, base85, or Z85 data of length 1 mod 5 now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementations.

I believe you have to document this change.

kangtastic

Note that attempting to decode Ascii85, base85, or Z85 data of length 1 mod 5 now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementations.

I believe you have to document this change.

Fair point, I could do that.

In case anyone argues for keeping the old behavior (silently ignoring length 1 mod 5), I won't do it just yet.

If we were strictly following PEP-0399, _base64 would be a C module for accelerated functions in base64. Due to historical reasons, those should actually go in binascii instead. We still want to preserve the existing Python code in base64. Parting out facilities for accessing the C functions into a module named _base64 shouldn't risk a naming conflict and will simplify testing.

This is done differently to PEP-0399 to minimize the number of changed lines.

As we're now keeping the existing Python base 85 functions, the C implementations should behave exactly the same, down to exception type and wording. It is also no longer an error to try to decode data of length 1 mod 5.

kangtastic

The PR has been updated to preserve the existing base 85 Python functions in base64 and modify the new base 85 C functions in binascii to closely match their behavior. Notably, trying to decode data of length 1 mod 5 is no longer an error.

Importing update_wrapper() from functools to copy attributes is expensive. Do it ourselves for only the most relevant ones.

This requires some code duplication, but oh well.

Using a decorator complicates function signature introspection.

Do we really need to test the legacy API twice?

Include an integer overflow check for Ascii85.

Performance gains of up to 8% for a2b_ascii85() and 25% for a2b_base85() and a2b_z85() were observed.

kangtastic

The PR has been updated to address most reviewer comments. I even found some minor decoding performance gains. A bit more polish and this one will be done!

serhiy-storchaka

Thank you for your update @kangtastic. I added support of ignorechars in the Base64 decoder, so now the similar code can be used in the Ascii85 decoder. Also note the changes in the Ascii85 documentation.

Use the same names for parameters in binascii and base64.
Remove unneeded parameters.
Reuse the ignorechar() function for ignorechars.
Update the Ascii85 documentation -- refer to "bytes-like object" for "ignorechars". We can now guarantee this.

kangtastic

@serhiy-storchaka, sorry I was busy and didn't get to your feedback this past week. Looks good.

One last thing from me - I noticed you changed the clinic input for the ignorechars param in binascii.a2b_base64() to ignorechars: Py_buffer(py_default="<unrepresentable>") = None. I found a discussion thread about <unrepresentable> being for C arguments that are NULL, but it looks like the way you have it, ignorechars actually isn't NULL on the C side in binascii.a2b_base64(), so I don't know what's going on and if this change should be carried over to binascii.a2b_ascii85() or not.

serhiy-storchaka

No problem, I pushed my changes because there are several other issues wait for this change (Base32 codec, wrapcol and ignorechars for more codecs, maybe other options). Your work is great, sorry that it took several years to review it.

binascii.a2b_base64() is special case. It is non-strict by default, but this is a vulnerability. Since ignorechars is a new parameter, we can switch to strict mode by default if it is used. So, showing ignorechars=b'') in the signature would be a lie, because passing ignorechars=b'' and not passing the ignorechars argument have different semantic. So we use a special placeholder. This is temporary, when we made binascii.a2b_base64() strict by default, we could just make b'' the default value for ignorechars. The Ascii85 codec does not have such legacy.

There is also a bug in the Py_buffer converter. I only discovered it when worked on these issues. It only accepts None as the default value, even if None is not valid value for Py_buffer. It should accept NULL and a bytes literal. Then these declarations would look more natural.

kangtastic

Thanks for the explanation about clinic input. And thanks again for accepting my PR and making the process pleasant. The timeline is indeed unfortunate but also isn't anyone's fault.

…thonGH-102753) Add Ascii85, Base85, and Z85 encoders and decoders to binascii, replacing the existing pure Python implementations in base64. This makes the codecs two orders of magnitude faster and consume two orders of magnitude less memory. Note that attempting to decode Ascii85 or Base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

serhiy-storchaka

@kangtastic, are you also interested in implementing the Base32 codec in C? This will not only improve its performance and memory usage, but also make it easier to add more features in the future.

I'm going to do it eventually, but I'm short on time right now. If you're interested, I promise a quick review.

kangtastic

@serhiy-storchaka, I'm also short on time for the next ~2 weeks, but I might be interested after that. I'll ping you in this issue when I know for sure.

kangtastic

@serhiy-storchaka, I'm interested in writing that Base32 implementation after all. Let me know if it's still needed/if you're looking for anything specific.

serhiy-storchaka

It would be great!

I asked you because I see that you are quite capable of this, and it might also be interesting for you.

As for anything specific, I suggest to keep the code for casefold and map01 in Python for now, like in the Base16 codec.

kangtastic

@serhiy-storchaka, OK, I'll get started and do it as you suggest.

serhiy-storchaka

See also #145980. It would be enough to only implement a single pair of functions for Base32.

kangtastic

@serhiy-storchaka, did you want me to wait for #145981 to be finalized and merged before I create an issue and a PR for the base32 accelerator? It's almost finished except for testing, etc.

serhiy-storchaka

You can open an issue and start working on a PR, then rebase your branch after merging #145981. Just leave the code for base32hex for now.

I am also working on a PR which depends on #145981 and on your future code.

…thonGH-102753) Add Ascii85, Base85, and Z85 encoders and decoders to binascii, replacing the existing pure Python implementations in base64. This makes the codecs two orders of magnitude faster and consume two orders of magnitude less memory. Note that attempting to decode Ascii85 or Base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

bedevere-bot added the awaiting review label Mar 16, 2023

kangtastic changed the title ~~Add Ascii85 and base85 support to binascii~~ Mar 16, 2023

bedevere-bot mentioned this pull request Mar 16, 2023

base64.b85encode uses significant amount of RAM #101178

Closed

arhadthedev added the stdlib Standard Library Python modules in the Lib/ directory label Mar 23, 2023

kangtastic force-pushed the gh-101178-rework-base85 branch from 71f1955 to 7b4aba1 Compare March 19, 2024 09:27

kangtastic force-pushed the gh-101178-rework-base85 branch from 7b4aba1 to 05ae5ad Compare April 21, 2025 05:16

kangtastic changed the title ~~gh-101178: Add Ascii85 and base85 support to binascii~~ Apr 21, 2025

kangtastic changed the title ~~gh-101178: Add Ascii85. base85, and Z85 support to binascii~~ Apr 21, 2025

AA-Turner reviewed Apr 24, 2025

View reviewed changes

kangtastic added 5 commits April 26, 2025 06:37

Restore base64.py

aa06c5d

Test both Python and C codepaths in base64 …

6c0e4a3

This is done differently to PEP-0399 to minimize the number of changed lines.

Match behavior between Python and C base 85 functions …

ce4773c

As we're now keeping the existing Python base 85 functions, the C implementations should behave exactly the same, down to exception type and wording. It is also no longer an error to try to decode data of length 1 mod 5.

Add Z85 tests to binascii

4072e3b

Update generated files

bc9217f

AA-Turner reviewed Apr 27, 2025

View reviewed changes

kangtastic added 4 commits April 27, 2025 19:55

Avoid importing functools …

2c40ba0

Importing update_wrapper() from functools to copy attributes is expensive. Do it ourselves for only the most relevant ones.

Avoid circular import in _base64 …

fd9eaf7

This requires some code duplication, but oh well.

Do not use a decorator for changing exception type …

4746d18

Using a decorator complicates function signature introspection.

Test Python and C codepaths in base64 using mixins …

d075593

Do we really need to test the legacy API twice?

kangtastic added 3 commits January 17, 2026 18:30

Allow up to sys.maxsize output length when decoding base 85 …

da165d1

Include an integer overflow check for Ascii85.

Defer base 85 overflow check during decoding …

bf32f99

Performance gains of up to 8% for a2b_ascii85() and 25% for a2b_base85() and a2b_z85() were observed.

Merge branch 'main' into pythongh-101178-rework-base85

74f6ceb

serhiy-storchaka mentioned this pull request Jan 21, 2026

Incorrect documentation for ignorechars in base64.a85decode() #144027

Closed

serhiy-storchaka reviewed Jan 30, 2026

View reviewed changes

serhiy-storchaka and others added 8 commits February 6, 2026 11:06

Merge branch 'main' into pythongh-101178-rework-base85

4ba3e50

Rename parameters to match the base64 module.

99e0bde

Remove parameters strict_mode and newline.

cc6d485

Optimize ignorechars cache.

30f54a1

Harmonize documentation.

37df735

Add What's New entries.

56a02b2

Polish tests.

adb1922

Rename internal Base 85 codec functions to match Base 64 helpers

0730fdf

bedevere-app Bot removed the awaiting review label Feb 6, 2026

kangtastic mentioned this pull request Mar 20, 2026

C accelerator for Base32 character encoding #146192

Closed

Conversation

kangtastic commented Mar 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Synopsis

Discussion

Benchmarks

Uh oh!

ghost commented Mar 16, 2023 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kangtastic commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

python-cla-bot Bot commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kangtastic commented Apr 21, 2025

Uh oh!

sergey-miryanov commented Apr 21, 2025

Uh oh!

kangtastic commented Apr 21, 2025

Uh oh!

kangtastic commented Apr 27, 2025

Uh oh!

kangtastic commented Jan 18, 2026

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

kangtastic commented Feb 6, 2026

Uh oh!

serhiy-storchaka commented Feb 6, 2026

Uh oh!

Uh oh!

kangtastic commented Feb 7, 2026

Uh oh!

serhiy-storchaka commented Feb 24, 2026

Uh oh!

kangtastic commented Feb 24, 2026

Uh oh!

kangtastic commented Mar 13, 2026

Uh oh!

serhiy-storchaka commented Mar 13, 2026

Uh oh!

kangtastic commented Mar 14, 2026

Uh oh!

serhiy-storchaka commented Mar 15, 2026

Uh oh!

kangtastic commented Mar 19, 2026

Uh oh!

serhiy-storchaka commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

kangtastic commented Mar 16, 2023 •

edited

Loading

ghost commented Mar 16, 2023 •

edited by ghost

Loading

kangtastic commented Mar 19, 2024 •

edited

Loading

python-cla-bot Bot commented Apr 18, 2025 •

edited

Loading