◐ Shell
clean mode source ↗

Support CODDTest in sqlancer by DerZc · Pull Request #1054 · sqlancer/sqlancer

@fm4v

The initial CERT and CODDTest implementations diverged from their
papers in ways that defeated the test signal:

CERT was using actual row counts from running the queries and a
bidirectional mutator framework. Per Ba and Rigger, ICSE 2024 the
property under test is `EstCard(Q', D) <= EstCard(Q, D)` -- the
*estimator's* projection, with Q' strictly more restrictive than Q,
and "CERT eschews executing queries". This rewrite:

* Reads cardinality from `EXPLAIN ESTIMATE`, summing `rows` across
  the per-table tuples it returns. The query is never executed.
* Restricts mutations to one direction. `mutateWhere`/`mutateAnd`
  always AND-tighten or introduce a WHERE; `mutateOr` drops an OR
  operand (per the paper's restrictive-OR rule) or falls back to AND;
  `mutateDistinct` only promotes ALL -> DISTINCT. All return
  `increase=false`.
* Skips runs where the estimator returns nothing (non-MergeTree
  engines, `ORDER BY tuple()`, unsupported expressions), and skips
  runs where the structural-similarity gate on `EXPLAIN PLAN` shows
  too much drift.

CODDTest was toggling random optimizer flags via per-query `SETTINGS`
clauses and comparing results. Per Zhang and Rigger, SIGMOD 2025 the
oracle is constant-folding-driven: take a subexpression E in Q,
evaluate E to a value V via an auxiliary query A, build a folded
query F by substituting V for E, then assert results of Q and F are
identical. This rewrite implements the scalar-subquery variant
(same as DuckDBCODDTestOracle in the upstream PR sqlancer#1054):

  aux:    SELECT min(c)/max(c) FROM t                            -> V
  Q:      SELECT * FROM t WHERE col op (SELECT min/max(c) FROM t)
  F:      SELECT * FROM t WHERE col op V

Only `Int32`/`String` columns are folded since they are the only
types the existing schema generator and `ClickHouseSchema.getConstant`
support; NULL auxiliary results are skipped (NULL-propagation would
make the predicate UNKNOWN for every row and the equivalence does
not hold).

Verified locally against a release ClickHouse 26.5 server:

* CERT: ~6 q/s effective (most attempts skip because no estimate
  responds to the random mutation), 0 false positives in a 30s
  window.
* CODDTest: ~22 q/s, 96-97% successful statements, 0 false positives.

`mvn checkstyle:check` clean, `mvn package -Dmaven.test.skip=true`
succeeds.

Papers:
  CERT     https://doi.org/10.1145/3597503.3639076
  CODDTest https://doi.org/10.1145/3709674