bpo-41279: Add StreamReaderBufferedProtocol#21446

tontinton

https://bugs.python.org/issue41279

I got way better performance on await reader.read() using this branch on linux (check out the chart on the server.py script's comments).

The way I tested was writing a server / client:
server.py:

import asyncio
import contextlib
import time


async def client_connected(reader, writer):
    start = time.time()

    with contextlib.closing(writer):
        for i in range(1000):
            # the more this parameter's distance from 65536 is greater the better the performance global_buffer gives
            # On linux:
            # 65536 * 2 -> Gives about 160% better performance
            # 4096 -> Gives about 155% better performance
            # 65536 -> Gives about 125% better performance

            # On windows using 65536 gives the same performance for some reason, which is interesting
            # But any other value gives a bit better performance, for example 65536 * 2 gives about 120% better performance
            await reader.read(65536 * 2)

    print(f'{time.time() - start}')


async def main():
    server = await asyncio.start_server(client_connected, '127.0.0.1', 8888, global_buffer=True)

    addr = server.sockets[0].getsockname()
    print(f'Serving on {addr}')

    async with server:
        await server.serve_forever()


if __name__ == "__main__":
    asyncio.run(main())

client.py:

import asyncio
import contextlib


async def flood(ip, port):
    message = b'A' * 1024 * 64  # tweak this parameter as much as you want
    reader, writer = await asyncio.open_connection(ip, port)
    with contextlib.closing(writer):
        while True:
            writer.write(message)
            await writer.drain()


if __name__ == "__main__":
    asyncio.run(flood('127.0.0.1', 8888))

tontinton

I still need to fix the test_start_tls_client_buf_proto_1 test

tzickel

I am not sure global_buffer is the correct name. Also not sure if it's better or not, but maybe buffer_size=0 (which can be the default) instead of using another variable ?

Also maybe word better " Use carefully as each client connected to the server will allocate a 64k
buffer which means that if you know you will have a lot of clients at the
same time, you will run out of memory." ?

Stating that when using this buffer, each stream will pre-allocated a buffer (by default 64k) which will be used for the lifetime of the stream, even when no data is passed, which might stress memory in heavy concurrent stream usages.

tzickel

BTW, speaking of this performance, I've tried coding a zero allocation + zero copying data structure in Python to get rid of those allocations (by using an optional global pool) and copies:

https://github.com/tzickel/chunkedbuffer

In theory this can be used directly in the readinto part, and be exposed outside in the streamreader interface itself (without another copying like here).

tontinton

BTW, speaking of this performance, I've tried coding a zero allocation + zero copying data structure in Python to get rid of those allocations (by using an optional global pool) and copies:

https://github.com/tzickel/chunkedbuffer

In theory this can be used directly in the readinto part, and be exposed outside in the streamreader interface itself (without another copying like here).

I thought about doing it inside streamreader, so I wrote it locally and saw an improvement but not as much as in this branch (I wanted to maybe create a new PR after this one that)

About a memory pool, that's also good but requires a lot of planning of where it would sit + what would happen if we run out of memory in the pool but we still need to make a read call, that's another issue.
I do agree that in the long term this is the best solution.

I am not sure global_buffer is the correct name. Also not sure if it's better or not, but maybe buffer_size=0 (which can be the default) instead of using another variable ?

You are probably right, I'll just do buffer_size=0, and no I don't think it should be default.

1st1

This is one high quality PR, thanks for working on it. The fixes in proactor_events.py alone are great. I've left a couple comments.

1st1

@asvetlov Can you also take a look at this?

1st1

@asvetlov is going to take a look in a couple of days. Let's give this time until next Tuesday.

bedevere-bot

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

bedevere-bot

Thanks for making the requested changes!

@1st1, @asvetlov: please review the changes made to this pull request.

tontinton

Hey, just remembered that I did this, bumping! :)

This class gets better performance as BufferedProtocol uses read_into into a buffer allocated before, instead of allocating a new buffer each time read is called.

…pythonGH-21446) The transport did not know how to use the proper api exported by BufferedProtocol. Added a new callback function that calls getbuffer() and buffer_updated() instead of data_received() when the protocol given to it is of type BufferedProtocol. This is exactly the same way _SelectorSocketTransport handles a BufferedProtocol.

…port (pythonGH-21446) In the __init__ function if the protocol is of instance BufferedProtocol instead of creating a buffer object, we call get_buffer on the protocol to get its buffer. In addition _loop_reading now calls _data_received as soon as there is actual data instead of calling only after adding a recv_into event. The reason for this change is because read_into could call it's callback immediatly meaning overriding the data on the buffer before we actually call _data_received on it, which fixes the potential issue of missed data.

tontinton

I'll fix the ssl tests sometime this weekend

arhadthedev

@tontinton You can turn your PR into a draft and not worry about timeframes.

…pythonGH-21446) When calling set_protocol to change the protocol you can now change the type of the protocol from BufferedProtocol to Protocol or vice versa. start_tls needed this feature as it could read into a buffered protocol at first and then change the protocol to SSLProto which is a regular protocol.

…-21446)

…thonGH-21446)

github-actions

This PR is stale because it has been open for 30 days with no activity.

tontinton requested review from 1st1 and asvetlov as code owners July 11, 2020 17:17

the-knights-who-say-ni added the CLA signed label Jul 11, 2020

bedevere-bot added the awaiting review label Jul 11, 2020

tontinton changed the title ~~Fix issue 41279~~ Jul 11, 2020

tontinton changed the title ~~bpo-41273: Convert StreamReaderProtocol to a BufferedProtocol~~ Jul 11, 2020

tontinton force-pushed the fix-issue-41279 branch 2 times, most recently from d594f7a to af23da7 Compare July 11, 2020 17:38

tontinton changed the title ~~bpo-41279: Convert StreamReaderProtocol to a BufferedProtocol~~ Jul 11, 2020

tontinton force-pushed the fix-issue-41279 branch from af23da7 to c533f08 Compare July 14, 2020 20:49

tontinton force-pushed the fix-issue-41279 branch from c533f08 to 806b335 Compare July 14, 2020 22:28

tontinton force-pushed the fix-issue-41279 branch from 806b335 to 62da6f0 Compare July 14, 2020 22:41

tontinton force-pushed the fix-issue-41279 branch from 62da6f0 to ac1418d Compare July 14, 2020 22:54

tontinton force-pushed the fix-issue-41279 branch from ac1418d to 71ea0a6 Compare July 15, 2020 08:04

1st1 reviewed Jul 16, 2020

View reviewed changes

mpaolini reviewed Jul 16, 2020

View reviewed changes

tontinton force-pushed the fix-issue-41279 branch from 71ea0a6 to 2a4a9eb Compare July 16, 2020 23:30

mpaolini reviewed Jul 17, 2020

View reviewed changes

tontinton force-pushed the fix-issue-41279 branch from 2a4a9eb to 38704db Compare July 17, 2020 10:45

tontinton force-pushed the fix-issue-41279 branch from 0f02cac to 90b9e44 Compare July 19, 2020 23:21

1st1 reviewed Jul 22, 2020

View reviewed changes

1st1 approved these changes Jul 29, 2020

View reviewed changes

tzickel reviewed Jul 30, 2020

View reviewed changes

asvetlov requested changes Aug 3, 2020

View reviewed changes

tontinton mannequin mentioned this pull request Apr 10, 2022

asyncio: proactor read transport: use recv_into instead of recv #85445

Closed

gst reviewed Jul 7, 2022

View reviewed changes

tontinton added 3 commits July 8, 2022 16:40

bpo-41279: Add StreamReaderBufferedProtocol (pythonGH-21446) …

9a65cfe

This class gets better performance as BufferedProtocol uses read_into into a buffer allocated before, instead of allocating a new buffer each time read is called.

tontinton added 3 commits July 8, 2022 22:41

bpo-41279: Add BufferedProtocol to proactor transport tests (pythonGH… …

9e81f1a

…-21446)

bpo-41279: Add BufferedProtocol to unix read pipe transport tests (py… …

3dc4d18

…thonGH-21446)

tontinton mannequin mentioned this pull request Apr 10, 2022

Add a StreamReaderBufferedProtocol #85451

Open

msoxzw mentioned this pull request Jan 28, 2023

asyncio tcp transport on Windows reads bytearray instead of bytes #99941

Closed

Conversation

tontinton commented Jul 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tontinton commented Jul 19, 2020

Uh oh!

tzickel commented Jul 20, 2020

Uh oh!

tzickel commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tontinton commented Jul 20, 2020

Uh oh!

1st1 left a comment

Choose a reason for hiding this comment

Uh oh!

1st1 commented Jul 22, 2020

Uh oh!

1st1 commented Jul 29, 2020

Uh oh!

bedevere-bot commented Aug 3, 2020

Uh oh!

bedevere-bot commented Aug 8, 2020

Uh oh!

tontinton commented Oct 18, 2021

Uh oh!

tontinton commented Jul 8, 2022

Uh oh!

arhadthedev commented Jul 8, 2022

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

tontinton commented Jul 11, 2020 •

edited

Loading

tzickel commented Jul 20, 2020 •

edited

Loading