gh-139871: Optimize bytearray construction with encoding#142243

cmaloney

When a str is encoded in bytearray.__init__ the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the str the bigger the saving.

Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster

import pyperf

runner = pyperf.Runner()

runner.timeit(
    name="encode",
    setup="a = 'a' * 1_000_000",
    stmt="bytearray(a, encoding='utf8')")

Issue: Add .take_bytes([n]) a zero-copy path from bytearray to bytes #139871

When a `str` is encoded in `bytearray.__init__` the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the `str` the bigger the saving. Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster ```python import pyperf runner = pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```

cmaloney

cc: @vstinner this construction form doesn't appear a lot in the CPython codebase but does exist in other codebases.

This one I think is a safe subset from #141862; hope to revisit that eventually but it's definitely many-step to get working just right.

vstinner

LGTM. It seems to be safe to pick the bytes object in this case.

vstinner

Merged, thanks.

…n#142243) When a `str` is encoded in `bytearray.__init__` the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the `str` the bigger the saving. Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster ```python import pyperf runner = pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```

bedevere-app Bot added the awaiting review label Dec 3, 2025

bedevere-app Bot mentioned this pull request Dec 3, 2025

Add .take_bytes([n]) a zero-copy path from bytearray to bytes #139871

Closed

cmaloney added the skip news label Dec 3, 2025

cmaloney mentioned this pull request Dec 3, 2025

gh-139871: Optimize bytearray unique bytes iconcat #141862

Open

Merge branch 'main' into ba_tb_encoding

663ed88

vstinner approved these changes Dec 11, 2025

View reviewed changes

bedevere-app Bot added awaiting merge and removed awaiting review labels Dec 11, 2025

bedevere-app Bot removed the awaiting merge label Dec 15, 2025

cmaloney deleted the ba_tb_encoding branch December 15, 2025 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-139871: Optimize bytearray construction with encoding#142243

gh-139871: Optimize bytearray construction with encoding#142243
vstinner merged 2 commits into
python:mainfrom
cmaloney:ba_tb_encoding

cmaloney commented Dec 3, 2025 •

edited by bedevere-app Bot

Loading

Uh oh!

cmaloney commented Dec 11, 2025

Uh oh!

vstinner left a comment

Uh oh!

Uh oh!

vstinner commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cmaloney commented Dec 3, 2025 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmaloney commented Dec 11, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmaloney commented Dec 3, 2025 •

edited by bedevere-app Bot

Loading