{{ message }}
Support CODDTest in sqlancer#1054
Open
DerZc wants to merge 19 commits into
Open
Conversation
mrigger
reviewed
Jan 19, 2025
mrigger
left a comment
Contributor
There was a problem hiding this comment.
Thanks for the PR! Added some initial comments. I think it's quite challenging to review such a large PR. Do you think we could split it into multiple smaller ones (referencing this as the main PR)? For example, as a first step, we could add some of the new expression nodes and the visit methods.
Sorry, something went wrong.
fm4v
added a commit
to fm4v/sqlancer
that referenced
this pull request
May 15, 2026
The initial CERT and CODDTest implementations diverged from their papers in ways that defeated the test signal: CERT was using actual row counts from running the queries and a bidirectional mutator framework. Per Ba and Rigger, ICSE 2024 the property under test is `EstCard(Q', D) <= EstCard(Q, D)` -- the *estimator's* projection, with Q' strictly more restrictive than Q, and "CERT eschews executing queries". This rewrite: * Reads cardinality from `EXPLAIN ESTIMATE`, summing `rows` across the per-table tuples it returns. The query is never executed. * Restricts mutations to one direction. `mutateWhere`/`mutateAnd` always AND-tighten or introduce a WHERE; `mutateOr` drops an OR operand (per the paper's restrictive-OR rule) or falls back to AND; `mutateDistinct` only promotes ALL -> DISTINCT. All return `increase=false`. * Skips runs where the estimator returns nothing (non-MergeTree engines, `ORDER BY tuple()`, unsupported expressions), and skips runs where the structural-similarity gate on `EXPLAIN PLAN` shows too much drift. CODDTest was toggling random optimizer flags via per-query `SETTINGS` clauses and comparing results. Per Zhang and Rigger, SIGMOD 2025 the oracle is constant-folding-driven: take a subexpression E in Q, evaluate E to a value V via an auxiliary query A, build a folded query F by substituting V for E, then assert results of Q and F are identical. This rewrite implements the scalar-subquery variant (same as DuckDBCODDTestOracle in the upstream PR sqlancer#1054): aux: SELECT min(c)/max(c) FROM t -> V Q: SELECT * FROM t WHERE col op (SELECT min/max(c) FROM t) F: SELECT * FROM t WHERE col op V Only `Int32`/`String` columns are folded since they are the only types the existing schema generator and `ClickHouseSchema.getConstant` support; NULL auxiliary results are skipped (NULL-propagation would make the predicate UNKNOWN for every row and the equivalence does not hold). Verified locally against a release ClickHouse 26.5 server: * CERT: ~6 q/s effective (most attempts skip because no estimate responds to the random mutation), 0 false positives in a 30s window. * CODDTest: ~22 q/s, 96-97% successful statements, 0 false positives. `mvn checkstyle:check` clean, `mvn package -Dmaven.test.skip=true` succeeds. Papers: CERT https://doi.org/10.1145/3597503.3639076 CODDTest https://doi.org/10.1145/3709674
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.
No description provided.