◐ Shell
reader mode source ↗
Skip to content

fix(bigquery): Prefer query over table in get_table_query_string#6360

Merged
ntkathole merged 2 commits into
feast-dev:masterfrom
Jwrede:fix/bigquery-source-query-priority
May 3, 2026
Merged

fix(bigquery): Prefer query over table in get_table_query_string#6360
ntkathole merged 2 commits into
feast-dev:masterfrom
Jwrede:fix/bigquery-source-query-priority

Conversation

@Jwrede

@Jwrede Jwrede commented May 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #6200

When both table and query are set on a BigQuerySource, get_table_query_string() silently ignores query and always returns the table reference. This makes it impossible to use a custom read query (e.g., for deduplication via QUALIFY) on a PushSource, since PushSource requires table for offline writes.

Root cause: get_table_query_string() checks if self.table first — since table is always truthy when set, query is never reached.

Fix: Invert the priority — prefer query when present (it's more specific and intentionally provided), fall back to table. The write path (offline_write_batch()) accesses .table directly and is unaffected.

Also applies the same fix to get_table_column_names_and_types() so schema inference uses the query when both are set, matching the actual read path.

Changes

  • bigquery_source.py: Swap condition order in get_table_query_string() and get_table_column_names_and_types() to prefer query over table
  • test_bigquery.py: Add 4 unit tests covering table-only, query-only, both-set, and write-path-unaffected scenarios

Test plan

  • All 11 existing + new unit tests pass (pytest sdk/python/tests/unit/infra/offline_stores/test_bigquery.py)
  • Integration test with BigQuery PushSource (requires GCP credentials)

@Jwrede Jwrede requested review from a team and sudohainguyen as code owners May 2, 2026 22:40
@ntkathole ntkathole changed the title fix(bigquery): prefer query over table in get_table_query_string May 3, 2026
@Jwrede Jwrede force-pushed the fix/bigquery-source-query-priority branch from fc290f7 to af72355 Compare May 3, 2026 05:33
@Jwrede

Jwrede commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

The failing test (test_online_write_batch_async_skip_dedup_single_pipeline in test_redis.py) is unrelated to this PR — it's an event-loop issue introduced in 2e50da0 that affects macOS and Python 3.12. My changes only touch bigquery_source.py and test_bigquery.py.

The 3.10 and 3.11 Ubuntu runs pass cleanly.

@ntkathole

Copy link
Copy Markdown
Member

@Jwrede init from bigquery_source.py says Exactly one of 'table' and 'query' must be specified. ?

@Jwrede

Jwrede commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

Good catch, the docstring says "Exactly one of 'table' and 'query' must be specified" but the actual validation (line 67) only enforces at least one:

if table is None and query is None:
    raise ValueError('No "table" or "query" argument provided.')

It has never rejected both being reads. I'll push an update to fix the docstring to match reality.

Jwrede added 2 commits May 3, 2026 06:03
When both `table` and `query` are set on a BigQuerySource,
`get_table_query_string()` now returns the query (wrapped in parens)
instead of the table reference. This allows PushSource users to
provide a custom read query (e.g. for deduplication) while keeping
`table` for offline writes via `offline_write_batch()`.

Also applies the same priority inversion to
`get_table_column_names_and_types()` so schema inference matches the
actual read path.

Closes feast-dev#6200

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
The validation only enforces at least one of table/query, not exactly
one. Update the docstring to document the supported behavior when both
are set.

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
@Jwrede Jwrede force-pushed the fix/bigquery-source-query-priority branch from 39f2047 to 7db33e0 Compare May 3, 2026 06:03
Hide details View details @ntkathole ntkathole merged commit 77ed779 into feast-dev:master May 3, 2026
23 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BigQuerySource.get_table_query_string() silently ignores query when table is also set

2 participants