Fix SSL deferred error by youknowone · Pull Request #6371 · RustPython/RustPython
Summary by CodeRabbit
-
Bug Fixes
- Certificate verification failures are now correctly propagated and reported (with contextual messages) so clients receive proper TLS alerts.
- Deferred verification errors are surfaced after related I/O completes to ensure consistent error timing.
- Improved detection and handling of closed-connection scenarios, treating certain socket errors as EOF-like.
-
Refactor
- Unified and simplified SSL/TLS handshake flow and lock management to reduce deadlock risk and improve consistency.
✏️ Tip: You can customize this high-level summary in your review settings.
Walkthrough
Simplifies TLS handshake lock handling, unifies client/server handshake flow, moves deferred certificate-verification checks to occur after I/O (read/write/handshake), and improves socket error detection to treat certain connection-closed errors as EOF.
Changes
| Cohort / File(s) | Summary |
|---|---|
Handshake & I/O control flow crates/stdlib/src/ssl.rs |
Consolidates client/server handshake into a single path using an is_client flag, removes per-branch guard-drop patterns, drops/reacquires the connection lock earlier, reworks read/write loops and write-flush semantics, and moves post-handshake deferred-cert checks after I/O completion. |
Certificate verification propagation crates/stdlib/src/ssl/cert.rs |
DeferredClientCertVerifier::verify_client_cert now records a formatted per-error message and returns the original verification error to rustls instead of returning success on failure. |
Socket/exception mapping & EOF detection crates/stdlib/src/ssl/compat.rs |
Adds is_connection_closed_error(exc, vm) helper to classify connection-closed socket errors (ConnectionAborted/Reset or ECONNABORTED/ECONNRESET) as EOF-like, uses IntoPyException conversion for I/O errors, and checks queued TLS alerts before mapping socket errors to SslError::Eof. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant ClientApp as Client
participant TLS as TLSConn (rustls)
participant Socket as Socket
participant PyLayer as Python VM
Note over ClientApp,TLS: Handshake/read/write unified flow (is_client flag)
ClientApp->>TLS: initiate handshake / read / write
TLS->>Socket: perform socket I/O (send/recv)
Socket-->>TLS: I/O result (data or error)
alt data received
TLS->>TLS: process rustls packets
TLS->>PyLayer: check_deferred_cert_error (post-ssl_read or post-write or after handshake)
PyLayer-->>TLS: deferred cert error (optional)
TLS->>ClientApp: return decrypted/plain data
else socket error
TLS->>TLS: check queued TLS alerts
TLS->>PyLayer: is_connection_closed_error?
alt connection-closed
TLS->>ClientApp: return EOF (SslError::Eof)
else other error
TLS->>ClientApp: map and raise Python I/O exception via IntoPyException
end
end
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~40 minutes
- Areas needing careful review:
crates/stdlib/src/ssl.rs: lock/drop timing changes and unified handshake control flow (deadlock/race checks).crates/stdlib/src/ssl/cert.rs: ensure propagated verification errors produce correct TLS alerts and server behavior.crates/stdlib/src/ssl/compat.rs: correctness ofis_connection_closed_erroracross platforms and errno handling; verify queued-alert checks before mapping EOF.
Suggested reviewers
- coolreader18
- ShaharNaveh
- arihant2math
Poem
🐇 I hopped through locks with nimble paws,
Unified paths and cleaner laws.
Deferred certs now speak their name,
EOFs found true, not just a claim.
A little rabbit cheers the change—hip, hop, hooray! 🎉
Pre-merge checks and finishing touches
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title 'Fix SSL deferred error' is concise and directly references the main change: addressing deferred certificate verification error handling in SSL operations. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%. |
✨ Finishing touches
- 📝 Generate docstrings
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Code has been automatically formatted
The code in this PR has been formatted using cargo fmt --all.
Please pull the latest changes before pushing again:
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
crates/stdlib/src/ssl.rs (1)
2602-2604: The explicit drop on line 2604 has no effect.The statement
let _ = conn;does not release the lock early. Sinceconnis a reference (&TlsConnection), this line is a no-op. TheMutexGuardis held by theif letexpression and is automatically dropped at the end of the block at line 2617.If the intent is to release the lock immediately after calling
is_session_resumed(), you need to drop the guard itself, not the borrowed reference:-if let Some(ref conn) = *self.connection.lock() { +let was_resumed = { + let conn_guard = self.connection.lock(); + if let Some(ref conn) = *conn_guard { + conn.is_session_resumed() + } else { + false + } +}; +if was_resumed { + *self.session_was_reused.lock() = was_resumed; + - let was_resumed = conn.is_session_resumed(); - let _ = conn; // Release lock ASAP - *self.session_was_reused.lock() = was_resumed;However, since the remaining operations in this block (lines 2607-2616) don't hold other locks and execute quickly, the current approach may be acceptable. The comment is misleading though.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
crates/stdlib/src/ssl.rs(5 hunks)crates/stdlib/src/ssl/cert.rs(1 hunks)crates/stdlib/src/ssl/compat.rs(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/stdlib/src/ssl.rscrates/stdlib/src/ssl/compat.rscrates/stdlib/src/ssl/cert.rs
🧬 Code graph analysis (2)
crates/stdlib/src/ssl.rs (1)
crates/stdlib/src/ssl/compat.rs (5)
ssl_do_handshake(1196-1340)ssl_read(1348-1445)data(1612-1614)writer(319-324)new(1668-1670)
crates/stdlib/src/ssl/compat.rs (1)
crates/stdlib/src/socket.rs (1)
errno(1836-1842)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Check Rust code with clippy
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
🔇 Additional comments (8)
crates/stdlib/src/ssl/cert.rs (1)
1059-1079: LGTM! Critical fix for proper TLS alert propagation.The previous behavior (storing error but returning success) would cause the TLS handshake to appear successful even when certificate verification failed, preventing proper TLS alerts from being sent to clients. Now the error is both stored for Python-side reporting AND returned to rustls for proper protocol-level error handling.
The
if let Err(ref e)pattern correctly borrows the error without consumingresult, allowing the subsequentreturn resultto work correctly.crates/stdlib/src/ssl/compat.rs (4)
27-27: LGTM!Adding the
IntoPyExceptionimport enables idiomatic conversion of I/O errors to Python exceptions.
562-562: LGTM!Using
IntoPyExceptiontrait for I/O error conversion is the idiomatic approach in RustPython.
1531-1552: LGTM! Well-structured connection-closed detection.The function correctly handles both:
- High-level Python exception types (
ConnectionAbortedError,ConnectionResetError)- Low-level
OSErrorwith specific errno values (ECONNABORTED,ECONNRESET)This comprehensive check ensures connection-closed scenarios are detected regardless of how the underlying error is wrapped.
1589-1609: LGTM! Important fix for SSL error handling, especially on Windows.The error handling order is correct:
- Check for queued TLS alerts first (Lines 1596-1598) - ensures SSL-level errors take precedence over transport errors
- Map connection-closed errors to
Eof(Lines 1604-1606) - correctly treats unexpected TCP closure as SSL protocol violation- Propagate other errors as Python exceptions (Line 1607)
This addresses the Windows-specific race condition where TCP RST can arrive before TLS alerts are read, which would otherwise cause confusing connection errors instead of proper SSL errors.
crates/stdlib/src/ssl.rs (3)
3212-3251: LGTM! Good refactoring to prevent deadlock.The changes successfully reduce lock hold time and prevent potential deadlock during SNI callback execution. Key improvements:
- The lock is explicitly dropped after
ssl_do_handshakereturns (line 3215), before any Python callback invocation or recursive calls.- The
is_clientflag is correctly derived from the connection type.- The server-side SNI callback restart logic correctly clears the connection and recursively calls
do_handshakeafter the lock is released.This aligns with the PR objective of fixing deferred certificate errors by ensuring locks are managed properly during handshake.
3336-3347: LGTM! Correct placement of deferred certificate error check.The refactoring correctly moves the
check_deferred_cert_error()call to afterssl_readcompletes (line 3347), while the connection lock is released after line 3342. This ensures:
- The deferred error (which may be set during I/O in
ssl_read) is checked after the I/O operation completes.- The connection lock is not held during error checking/raising, preventing potential deadlock.
- The error is checked before processing the read result, ensuring errors are surfaced to the caller.
This aligns with the PR's goal of fixing deferred SSL errors by checking them at the appropriate time.
3432-3498: LGTM! Correct placement of deferred certificate error check in write path.The refactoring correctly moves the
check_deferred_cert_error()call to after all write operations complete (line 3498), while the connection lock is released after line 3494. This ensures:
- The deferred certificate verification error (which may be set during TLS I/O operations) is checked after all write operations complete.
- The connection lock is not held during error checking/raising.
- The write is atomic - the lock is held for the entire write operation, preventing interleaving with other operations.
The chunked write implementation (lines 3441-3493) correctly:
- Writes data in 16KB chunks to avoid filling the TLS buffer
- Flushes TLS data to the socket/BIO after each chunk
- Handles both BIO and socket modes appropriately
This aligns with the PR's goal of fixing deferred SSL errors.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
crates/stdlib/src/ssl/cert.rs (1)
1065-1079: Client-cert verification behavior change is good; update the “always succeeds” comment.Returning
resultwhile also recording the message fixes the TLS alert behavior and keeps the error available for Python-side checks. However, the comment onDeferredClientCertVerifierstill claims this verifier “always succeeds during handshake but stores verification errors for later retrieval,” which is no longer accurate after this change. Please update that comment (and optionally the type name) so future readers don’t assume the handshake always succeeds for TLS 1.3 client-auth.crates/stdlib/src/ssl.rs (1)
3435-3503: Chunked write/flush loop is reasonable; deferred error check placement is appropriate.Writing in 16 KiB chunks through
conn.writer(), flushing pending TLS data via BIO or socket untilwants_write()is false, and only then callingcheck_deferred_cert_errorprovides clearer, more robust write semantics while still honoring TLS buffering limits. The extra allocations for the temporaryVec<u8>in socket mode are acceptable; if this ever shows up in profiles, you can optimize by reusing a buffer, but it isn’t urgent.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
crates/stdlib/src/ssl.rs(5 hunks)crates/stdlib/src/ssl/cert.rs(1 hunks)crates/stdlib/src/ssl/compat.rs(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/stdlib/src/ssl.rscrates/stdlib/src/ssl/cert.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (1)
crates/stdlib/src/ssl/compat.rs (4)
ssl_do_handshake(1196-1340)ssl_read(1348-1445)data(1612-1614)new(1668-1670)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run tests under miri
- GitHub Check: Check Rust code with clippy
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run rust tests (macos-latest)
🔇 Additional comments (3)
crates/stdlib/src/ssl.rs (3)
2598-2620: Session-resume tracking and stats update look correct.Using a short-lived
connection.lock()to queryis_session_resumed()and then updatingsession_was_reusedand the server-side counters outside that critical section keeps locking tight while correctly reflecting resume state and accept/hit stats.
3215-3254: Unified client/server handshake flow simplifies control and preserves SNI restart semantics.The new pattern—lock connection once, compute
is_client, callssl_do_handshake, then drop the guard and branch on client vs server—removes duplicated paths without changing observable behavior. The SNISniCallbackRestartcase still runs the Python callback with no connection lock held, clearsself.connection, and re-entersdo_handshake, which matches the documented OpenSSL-style restart behavior.
3339-3351: Read path now checks deferred cert errors at the right time.Acquiring the connection lock only for the
ssl_readcall and then invokingcheck_deferred_cert_erroron the successfulOk(n)path cleanly separates TLS I/O from Python exception raising while ensuring any deferred client-cert failure is surfaced immediately after plaintext is delivered.