Rustls integration improvements#7946
Conversation
📝 WalkthroughWalkthroughUnifies socket waiting with SockWaitKind/sock_wait and deadline handling, updates PySocket call sites, migrates SSL code to rustls::Connection with TLS-record-aware socket reads and cached socket methods, tightens handshake/SNI flows, updates error mapping, and adds two regression tests for short/fragmented recv behavior. ChangesSocket Wait API and SSL/TLS Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sorry, something went wrong.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
extra_tests/snippets/stdlib_urllib_https_misaligned_recv.py (2)
205-216: 💤 Low valueDocument the
-17TLS overhead constant.
max(1, n - 17)derives plaintext sizes from the captured wire record sizes by subtracting an assumed 17 bytes of TLS 1.3 framing overhead (5-byte record header + 1-byte content type + 11-byte AEAD tag minus padding, depending on cipher). A short comment would make the intent obvious to future readers, since the magic number is otherwise opaque.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@extra_tests/snippets/stdlib_urllib_https_misaligned_recv.py` around lines 205 - 216, The code uses a magic constant "-17" when computing plaintext_sizes from TLS_RECORD_BODY_SIZES (plaintext_sizes = [max(1, n - 17) ...]) which is unclear; update the code by adding a brief inline comment next to plaintext_sizes (or above its definition) that explains the origin of 17 (5‑byte TLS record header + 1‑byte content type + ~11‑byte AEAD tag for TLS 1.3 framing) and notes that this is an approximation for the captured wire record sizes so future readers understand the subtraction, referencing variables TLS_RECORD_BODY_SIZES and plaintext_sizes.
175-193: 💤 Low valueHandshake/read loops don't detect peer closing the connection.
If
conn.recv(65536)returnsb""(client disconnects mid-handshake or before sending\r\n\r\n),incoming.write(b"")is a no-op and the nexttls.do_handshake()/tls.read()re-raisesSSLWantReadError, looping forever untilguard_timeoutaborts the process. Theguard_timeoutsafety net covers this, but the failure mode would be a 20-second hang plus a process abort rather than a clean test failure with a useful traceback. Consider breaking out on empty recv (e.g. raise/return) so a regression in the client surfaces directly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@extra_tests/snippets/stdlib_urllib_https_misaligned_recv.py` around lines 175 - 193, The handshake and read loops using tls.do_handshake() and tls.read() never detect a peer close because conn.recv(65536) returning b"" is ignored by incoming.write(), causing infinite SSLWantReadError retries; modify both loops (the block handling SSLWantReadError in the handshake loop and the SSLWantReadError handler in the request/read loop) to check the result of conn.recv(65536) and if it is b"" immediately break/raise an exception (or return) so the function fails fast instead of looping until guard_timeout; keep using drain_outgoing(outgoing, conn) as before but ensure you handle the empty-recv case consistently to surface client disconnects cleanly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@extra_tests/snippets/stdlib_ssl_short_recv.py`:
- Line 2: Remove the unused top-level import "random" from the file (the import
statement referencing the random module in stdlib_ssl_short_recv.py) since it is
not referenced anywhere and causes a Flake8 F401; simply delete that import line
so the module no longer imports random unnecessarily.
In `@extra_tests/snippets/stdlib_urllib_https_misaligned_recv.py`:
- Around line 144-148: The timeout guard in function guard_timeout prints the
wrong test filename; update the stderr message to reference the current file
name "stdlib_urllib_https_misaligned_recv.py" instead of
"stdlib_urllib_https_short_recv.py" so the printed line in guard_timeout
correctly identifies the failing test when it times out.
---
Nitpick comments:
In `@extra_tests/snippets/stdlib_urllib_https_misaligned_recv.py`:
- Around line 205-216: The code uses a magic constant "-17" when computing
plaintext_sizes from TLS_RECORD_BODY_SIZES (plaintext_sizes = [max(1, n - 17)
...]) which is unclear; update the code by adding a brief inline comment next to
plaintext_sizes (or above its definition) that explains the origin of 17 (5‑byte
TLS record header + 1‑byte content type + ~11‑byte AEAD tag for TLS 1.3 framing)
and notes that this is an approximation for the captured wire record sizes so
future readers understand the subtraction, referencing variables
TLS_RECORD_BODY_SIZES and plaintext_sizes.
- Around line 175-193: The handshake and read loops using tls.do_handshake() and
tls.read() never detect a peer close because conn.recv(65536) returning b"" is
ignored by incoming.write(), causing infinite SSLWantReadError retries; modify
both loops (the block handling SSLWantReadError in the handshake loop and the
SSLWantReadError handler in the request/read loop) to check the result of
conn.recv(65536) and if it is b"" immediately break/raise an exception (or
return) so the function fails fast instead of looping until guard_timeout; keep
using drain_outgoing(outgoing, conn) as before but ensure you handle the
empty-recv case consistently to surface client disconnects cleanly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 36cac236-b9a6-408d-b2e0-d26dd9385242
📒 Files selected for processing (8)
crates/host_env/src/socket.rscrates/stdlib/src/socket.rscrates/stdlib/src/ssl.rscrates/stdlib/src/ssl/cert.rscrates/stdlib/src/ssl/compat.rscrates/stdlib/src/ssl/error.rsextra_tests/snippets/stdlib_ssl_short_recv.pyextra_tests/snippets/stdlib_urllib_https_misaligned_recv.py
💤 Files with no reviewable changes (2)
- crates/stdlib/src/ssl/error.rs
- crates/host_env/src/socket.rs
Sorry, something went wrong.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/stdlib/src/ssl.rs (1)
1894-1895: ⚠️ Potential issue | 🟠 Major | ⚡ Quick winCache the bound socket methods from the wrapped object.
Looking up
send/recvonsocket.sockethard-wires base-class dispatch and skips overrides on socket subclasses or instance-attached methods. That changes I/O behavior for wrapped custom sockets compared to resolving the methods fromargs.sock.🐛 Proposed fix
- let socket_mod = vm.import("socket", 0)?; - let socket_class = socket_mod.get_attr("socket", vm)?; - // Create _SSLSocket instance let ssl_socket = PySSLSocket { sock: args.sock.clone(), - sock_send_method: socket_class.get_attr("send", vm)?, - sock_recv_method: socket_class.get_attr("recv", vm)?, + sock_send_method: args.sock.get_attr("send", vm)?, + sock_recv_method: args.sock.get_attr("recv", vm)?, tls_record_header_buf: vm .ctx .new_bytearray(Vec::with_capacity(TLS_RECORD_HEADER_SIZE)) .into(),- let socket_mod = vm.import("socket", 0)?; - let socket_class = socket_mod.get_attr("socket", vm)?; - // Create _SSLSocket instance with BIO mode let ssl_socket = PySSLSocket { sock: vm.ctx.none(), // No socket in BIO mode - sock_send_method: socket_class.get_attr("send", vm)?, - sock_recv_method: socket_class.get_attr("recv", vm)?, + sock_send_method: vm.ctx.none(), + sock_recv_method: vm.ctx.none(), tls_record_header_buf: vm.ctx.none(),- self.sock_recv_method - .call((self.sock.clone(), vm.ctx.new_int(size)), vm) + self.sock_recv_method.call((vm.ctx.new_int(size),), vm) ... - self.sock_send_method - .call((self.sock.clone(), vm.ctx.new_bytes(data.to_vec())), vm) + self.sock_send_method + .call((vm.ctx.new_bytes(data.to_vec()),), vm)Also applies to: 1975-1976
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/ssl.rs` around lines 1894 - 1895, The code currently caches send/recv using socket_class.get_attr which forces base-class lookup and ignores overrides on the wrapped instance; change the lookup to retrieve the bound methods from the actual wrapped object (args.sock.get_attr("send", vm) and args.sock.get_attr("recv", vm)) so instance-attached methods and subclass overrides are honored, and apply the same change for the other occurrence that sets sock_send_method/sock_recv_method; preserve the existing error handling semantics when get_attr returns an error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/stdlib/src/ssl.rs`:
- Around line 1894-1895: The code currently caches send/recv using
socket_class.get_attr which forces base-class lookup and ignores overrides on
the wrapped instance; change the lookup to retrieve the bound methods from the
actual wrapped object (args.sock.get_attr("send", vm) and
args.sock.get_attr("recv", vm)) so instance-attached methods and subclass
overrides are honored, and apply the same change for the other occurrence that
sets sock_send_method/sock_recv_method; preserve the existing error handling
semantics when get_attr returns an error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 629fa543-a407-4c56-9f0a-d36885bf549f
📒 Files selected for processing (5)
crates/stdlib/src/ssl.rscrates/stdlib/src/ssl/cert.rscrates/stdlib/src/ssl/compat.rsextra_tests/snippets/stdlib_ssl_short_recv.pyextra_tests/snippets/stdlib_urllib_https_misaligned_recv.py
✅ Files skipped from review due to trivial changes (1)
- crates/stdlib/src/ssl/cert.rs
Sorry, something went wrong.
Use method references cached during socket creation.
Previous algorithm didn't take into account that recv() may return less data than requested even for blocking sockets.
First of all, existing code does not really work and this leads to an infinite loop: RustPython#7891 Second, this should never happen when rustls used properly (wrt wants_read() and wants_write()) and thus all such errors are implementation bugs that must be properly fixed.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/stdlib/src/ssl.rs (1)
1842-1867: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReuse
validate_hostname()in_wrap_socket.This path no longer enforces the 253-character limit that
_wrap_bio()andset_server_hostname()apply, so oversized hostnames now fail later with a different error surface instead of the expected earlyValueError.Suggested fix
Some(hostname_str) => { let hostname = hostname_str.as_str(); - - // Validate hostname - if hostname.is_empty() { - return Err(vm.new_value_error("server_hostname cannot be an empty string")); - } - - // Check if it starts with a dot - if hostname.starts_with('.') { - return Err(vm.new_value_error("server_hostname cannot start with a dot")); - } - - // IP addresses are allowed - // SNI will not be sent for IP addresses - - // Check for NULL bytes - if hostname.contains('\0') { - return Err(vm.new_type_error("embedded null character")); - } - + validate_hostname(hostname, vm)?; Some(hostname.to_string()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/ssl.rs` around lines 1842 - 1867, The hostname validation in _wrap_socket currently duplicates checks but omits the 253-character limit enforced by validate_hostname(); update the _wrap_socket path (the code handling args.server_hostname in ssl.rs) to call the existing validate_hostname(hostname_str) utility instead of reimplementing checks so the same length, dot, empty, and null-byte validations are applied and the correct ValueError is raised consistently (replace the manual checks in the Some(hostname_str) branch with a call to validate_hostname and use its result to produce the Some(String) or appropriate VM error).
🧹 Nitpick comments (1)
crates/stdlib/src/ssl/compat.rs (1)
756-758: ⚡ Quick winImport
VERIFY_X509_PARTIAL_CHAINinstead of redefining0x80000.This file already shares verification flags through
_ssl; keeping a second literal copy here makes the two paths drift-prone the next time those constants move.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/ssl/compat.rs` around lines 756 - 758, Remove the local const VERIFY_X509_PARTIAL_CHAIN = 0x80000 and instead reference the shared constant from the existing _ssl module; update the imports/usages so the conditional uses _ssl::VERIFY_X509_PARTIAL_CHAIN (or whichever symbol is exported from the _ssl shim) in the expression options.verify_flags & VERIFY_X509_PARTIAL_CHAIN != 0 and keep the rest of the PartialChainVerifier wrapping logic unchanged so the code relies on the single shared constant instead of a duplicated literal.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@extra_tests/snippets/stdlib_ssl_short_recv.py`:
- Around line 55-56: Replace the overly broad except BaseException clause with
except Exception so system-level exceptions propagate: change "except
BaseException as exc:" to "except Exception as exc:" (or "except Exception:" if
you prefer not to reuse exc) in the try/except that appends to server_errors,
leaving the server_errors.append(exc) logic intact.
In `@extra_tests/snippets/stdlib_urllib_https_misaligned_recv.py`:
- Around line 183-196: When handling ssl.SSLWantReadError in the client loop
(locations using tls.read, drain_outgoing, conn.recv and incoming.write), check
the return value of conn.recv(...) and if it returns b"" call
incoming.write_eof() instead of incoming.write(b"") so EOF is propagated into
the MemoryBIO and TLS can progress; keep calling drain_outgoing(outgoing, conn)
as before. Also replace the server-side catch-all "except BaseException as exc:"
with "except Exception as exc:" to satisfy the BLE001 lint warning.
- Around line 221-222: Replace the overly broad except BaseException handler
with except Exception to avoid catching SystemExit/KeyboardInterrupt; locate the
try/except that appends to server_errors (the lines containing "except
BaseException as exc" and "server_errors.append(exc)") and change it to "except
Exception as exc" so only regular exceptions from the test thread are captured.
---
Outside diff comments:
In `@crates/stdlib/src/ssl.rs`:
- Around line 1842-1867: The hostname validation in _wrap_socket currently
duplicates checks but omits the 253-character limit enforced by
validate_hostname(); update the _wrap_socket path (the code handling
args.server_hostname in ssl.rs) to call the existing
validate_hostname(hostname_str) utility instead of reimplementing checks so the
same length, dot, empty, and null-byte validations are applied and the correct
ValueError is raised consistently (replace the manual checks in the
Some(hostname_str) branch with a call to validate_hostname and use its result to
produce the Some(String) or appropriate VM error).
---
Nitpick comments:
In `@crates/stdlib/src/ssl/compat.rs`:
- Around line 756-758: Remove the local const VERIFY_X509_PARTIAL_CHAIN =
0x80000 and instead reference the shared constant from the existing _ssl module;
update the imports/usages so the conditional uses
_ssl::VERIFY_X509_PARTIAL_CHAIN (or whichever symbol is exported from the _ssl
shim) in the expression options.verify_flags & VERIFY_X509_PARTIAL_CHAIN != 0
and keep the rest of the PartialChainVerifier wrapping logic unchanged so the
code relies on the single shared constant instead of a duplicated literal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 66de22b5-fbbb-4a02-9b0b-4db2e1ac1cb6
📒 Files selected for processing (8)
crates/host_env/src/socket.rscrates/stdlib/src/socket.rscrates/stdlib/src/ssl.rscrates/stdlib/src/ssl/cert.rscrates/stdlib/src/ssl/compat.rscrates/stdlib/src/ssl/error.rsextra_tests/snippets/stdlib_ssl_short_recv.pyextra_tests/snippets/stdlib_urllib_https_misaligned_recv.py
💤 Files with no reviewable changes (2)
- crates/stdlib/src/ssl/error.rs
- crates/host_env/src/socket.rs
✅ Files skipped from review due to trivial changes (1)
- crates/stdlib/src/ssl/cert.rs
Sorry, something went wrong.
1) Ensure that socket_wait() called from TLS glue code allows threads 2) Ensure that socket_wait() called from TLS glue code properly handles EINTR on *nix 3) Ensure that select() or poll() error conditions are checked 4) Use poll() on *nix so socket descriptor values are not limited
TLS handshake cannot be infinite. Any infinite loop here is a serious bug in implementation and should be fixed properly. This code triggers in some cases (very short reads) with misleading `ssl_error.SSLWantReadError: The operation did not complete (read)`.
* Deduplicate verify flags / record-size constants * Larger "max encrypted TLS record length"
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/stdlib/src/ssl.rs (1)
4414-4433: ⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't treat EOF/reset as a successful
close_notify.This helper now returns
truefor empty reads and arbitrary socket errors, and it dropsread_tls()/process_new_packets()failures.shutdown()breaks on thattrue, so a ragged EOF or TCP reset can completeunwrap()successfully instead of surfacing the protocol error.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/ssl.rs` around lines 4414 - 4433, The helper try_read_close_notify_socket currently treats an empty read and arbitrary socket errors as a successful close_notify; instead, when sock_recv_at_most_one_tls_record returns Ok(data) and data.is_empty() you must NOT return true — return false (or propagate an error) so shutdown() does not consider the TLS shutdown complete; likewise, do not ignore errors from conn.read_tls() and conn.process_new_packets(): check their Results and if they Err, propagate that error to the VM (or return false) instead of swallowing it; finally, for Err(e) from sock_recv_at_most_one_tls_record, only return false for blocking IO errors (is_blocking_io_error), but for other socket errors propagate/raise the error rather than returning true. Ensure you update the logic in try_read_close_notify_socket to examine conn.read_tls and conn.process_new_packets results and to treat EOF/reset and non-blocking socket errors as errors (not successful close_notify).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/stdlib/src/ssl/compat.rs`:
- Line 1081: The BIO-mode read in compat.rs currently calls
socket.sock_recv(SSL3_RT_MAX_PACKET_SIZE, vm) and assumes the entire chunk is
consumed, which can return multiple TLS records from MemoryBIO; update the code
that handles socket.sock_recv (and the surrounding BIO-path) to apply the same
record-boundary logic used for socket reads: parse the returned buffer for a
single TLS record, consume only the first record and preserve any unread suffix
bytes back into the MemoryBIO (or buffer) for future reads instead of discarding
them; ensure the fix references the same handling around SSL3_RT_MAX_PACKET_SIZE
and the MemoryBIO consumer so BIO callers maintain the one-record-per-read
invariant.
---
Outside diff comments:
In `@crates/stdlib/src/ssl.rs`:
- Around line 4414-4433: The helper try_read_close_notify_socket currently
treats an empty read and arbitrary socket errors as a successful close_notify;
instead, when sock_recv_at_most_one_tls_record returns Ok(data) and
data.is_empty() you must NOT return true — return false (or propagate an error)
so shutdown() does not consider the TLS shutdown complete; likewise, do not
ignore errors from conn.read_tls() and conn.process_new_packets(): check their
Results and if they Err, propagate that error to the VM (or return false)
instead of swallowing it; finally, for Err(e) from
sock_recv_at_most_one_tls_record, only return false for blocking IO errors
(is_blocking_io_error), but for other socket errors propagate/raise the error
rather than returning true. Ensure you update the logic in
try_read_close_notify_socket to examine conn.read_tls and
conn.process_new_packets results and to treat EOF/reset and non-blocking socket
errors as errors (not successful close_notify).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 6b063cdf-afc5-4a66-9280-e74cb3ff1b50
📒 Files selected for processing (8)
crates/host_env/src/socket.rscrates/stdlib/src/socket.rscrates/stdlib/src/ssl.rscrates/stdlib/src/ssl/cert.rscrates/stdlib/src/ssl/compat.rscrates/stdlib/src/ssl/error.rsextra_tests/snippets/stdlib_ssl_short_recv.pyextra_tests/snippets/stdlib_urllib_https_misaligned_recv.py
💤 Files with no reviewable changes (2)
- crates/stdlib/src/ssl/error.rs
- crates/host_env/src/socket.rs
✅ Files skipped from review due to trivial changes (1)
- crates/stdlib/src/ssl/cert.rs
Sorry, something went wrong.
ShaharNaveh
left a comment
There was a problem hiding this comment.
TYSM!
thank you for looking into this:)
Sorry, something went wrong.
youknowone
left a comment
There was a problem hiding this comment.
Thank you so much!
Sorry, something went wrong.
2a16360
into
RustPython:main
May 22, 2026
Primary goal of this PR is to fix #7891
Changes:
import socketon each send()/recv() when using rustlsssl_error.SSLWantReadError: The operation did not complete (read).vm.allow_threads())Please note that this is no way near complete and current implementation probably still has a ton of bugs.
Comments
It is a current behavior and it is not that easy to change. I tried to naively replace it with calls to sock.send()/sock.recv() and it resulted in deadlocks.
Summary by CodeRabbit
Bug Fixes
Refactor
Tests