{{ message }}
gh-139871: Optimize bytearray construction with encoding#142243
Merged
vstinner merged 2 commits intoDec 15, 2025
Merged
Conversation
When a `str` is encoded in `bytearray.__init__` the encoder tends to
create a new unique bytes object. Rather than allocate new memory and
copy the bytes use the already created bytes object as bytearray
backing. The bigger the `str` the bigger the saving.
Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster
```python
import pyperf
runner = pyperf.Runner()
runner.timeit(
name="encode",
setup="a = 'a' * 1_000_000",
stmt="bytearray(a, encoding='utf8')")
```
Contributor
Author
|
cc: @vstinner this construction form doesn't appear a lot in the CPython codebase but does exist in other codebases. This one I think is a safe subset from #141862; hope to revisit that eventually but it's definitely many-step to get working just right. |
Sorry, something went wrong.
vstinner
approved these changes
Dec 11, 2025
vstinner
left a comment
Member
There was a problem hiding this comment.
LGTM. It seems to be safe to pick the bytes object in this case.
Sorry, something went wrong.
Hide details
View details
vstinner
merged commit
14e6052
into
python:main
Dec 15, 2025
83 of 85 checks passed
Member
|
Merged, thanks. |
Sorry, something went wrong.
fatelei
pushed a commit
to fatelei/cpython
that referenced
this pull request
Dec 16, 2025
…n#142243) When a `str` is encoded in `bytearray.__init__` the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the `str` the bigger the saving. Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster ```python import pyperf runner = pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.
When a
stris encoded inbytearray.__init__the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger thestrthe bigger the saving.Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster
.take_bytes([n])a zero-copy path frombytearraytobytes#139871