fix(bigquery): Prefer query over table in get_table_query_string#6360
Conversation
fc290f7 to
af72355
Compare
May 3, 2026 05:33
|
The failing test ( The 3.10 and 3.11 Ubuntu runs pass cleanly. |
Sorry, something went wrong.
|
@Jwrede init from bigquery_source.py says |
Sorry, something went wrong.
|
Good catch, the docstring says "Exactly one of 'table' and 'query' must be specified" but the actual validation (line 67) only enforces at least one: if table is None and query is None:
raise ValueError('No "table" or "query" argument provided.')It has never rejected both being reads. I'll push an update to fix the docstring to match reality. |
Sorry, something went wrong.
When both `table` and `query` are set on a BigQuerySource, `get_table_query_string()` now returns the query (wrapped in parens) instead of the table reference. This allows PushSource users to provide a custom read query (e.g. for deduplication) while keeping `table` for offline writes via `offline_write_batch()`. Also applies the same priority inversion to `get_table_column_names_and_types()` so schema inference matches the actual read path. Closes feast-dev#6200 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
The validation only enforces at least one of table/query, not exactly one. Update the docstring to document the supported behavior when both are set. Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
39f2047 to
7db33e0
Compare
May 3, 2026 06:03
77ed779
into
feast-dev:master
May 3, 2026
Summary
Fixes #6200
When both
tableandqueryare set on aBigQuerySource,get_table_query_string()silently ignoresqueryand always returns the table reference. This makes it impossible to use a custom read query (e.g., for deduplication viaQUALIFY) on aPushSource, sincePushSourcerequirestablefor offline writes.Root cause:
get_table_query_string()checksif self.tablefirst — sincetableis always truthy when set,queryis never reached.Fix: Invert the priority — prefer
querywhen present (it's more specific and intentionally provided), fall back totable. The write path (offline_write_batch()) accesses.tabledirectly and is unaffected.Also applies the same fix to
get_table_column_names_and_types()so schema inference uses the query when both are set, matching the actual read path.Changes
bigquery_source.py: Swap condition order inget_table_query_string()andget_table_column_names_and_types()to preferqueryovertabletest_bigquery.py: Add 4 unit tests covering table-only, query-only, both-set, and write-path-unaffected scenariosTest plan
pytest sdk/python/tests/unit/infra/offline_stores/test_bigquery.py)