The initial CERT and CODDTest implementations diverged from their
papers in ways that defeated the test signal:
CERT was using actual row counts from running the queries and a
bidirectional mutator framework. Per Ba and Rigger, ICSE 2024 the
property under test is `EstCard(Q', D) <= EstCard(Q, D)` -- the
*estimator's* projection, with Q' strictly more restrictive than Q,
and "CERT eschews executing queries". This rewrite:
* Reads cardinality from `EXPLAIN ESTIMATE`, summing `rows` across
the per-table tuples it returns. The query is never executed.
* Restricts mutations to one direction. `mutateWhere`/`mutateAnd`
always AND-tighten or introduce a WHERE; `mutateOr` drops an OR
operand (per the paper's restrictive-OR rule) or falls back to AND;
`mutateDistinct` only promotes ALL -> DISTINCT. All return
`increase=false`.
* Skips runs where the estimator returns nothing (non-MergeTree
engines, `ORDER BY tuple()`, unsupported expressions), and skips
runs where the structural-similarity gate on `EXPLAIN PLAN` shows
too much drift.
CODDTest was toggling random optimizer flags via per-query `SETTINGS`
clauses and comparing results. Per Zhang and Rigger, SIGMOD 2025 the
oracle is constant-folding-driven: take a subexpression E in Q,
evaluate E to a value V via an auxiliary query A, build a folded
query F by substituting V for E, then assert results of Q and F are
identical. This rewrite implements the scalar-subquery variant
(same as DuckDBCODDTestOracle in the upstream PR sqlancer#1054):
aux: SELECT min(c)/max(c) FROM t -> V
Q: SELECT * FROM t WHERE col op (SELECT min/max(c) FROM t)
F: SELECT * FROM t WHERE col op V
Only `Int32`/`String` columns are folded since they are the only
types the existing schema generator and `ClickHouseSchema.getConstant`
support; NULL auxiliary results are skipped (NULL-propagation would
make the predicate UNKNOWN for every row and the equivalence does
not hold).
Verified locally against a release ClickHouse 26.5 server:
* CERT: ~6 q/s effective (most attempts skip because no estimate
responds to the random mutation), 0 false positives in a 30s
window.
* CODDTest: ~22 q/s, 96-97% successful statements, 0 false positives.
`mvn checkstyle:check` clean, `mvn package -Dmaven.test.skip=true`
succeeds.
Papers:
CERT https://doi.org/10.1145/3597503.3639076
CODDTest https://doi.org/10.1145/3709674