{{ message }}
feat(spark): Add compute-on-read support for BatchFeatureView in get_…#6357
Merged
ntkathole merged 7 commits intoMay 3, 2026
Merged
Conversation
franciscojavierarceo
approved these changes
May 1, 2026
…historical_features Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
fcdf0e6 to
11d69be
Compare
May 2, 2026 04:28
Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
ntkathole
reviewed
May 2, 2026
ntkathole
reviewed
May 2, 2026
…n logic Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
ntkathole
reviewed
May 3, 2026
ntkathole
reviewed
May 3, 2026
…ew naming Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
ntkathole
reviewed
May 3, 2026
…V source resolution Signed-off-by: Siddhesh Khairnar <khairnarsiddhesh4057@gmail.com>
ntkathole
approved these changes
May 3, 2026
Hide details
View details
ntkathole
merged commit
630d9f8
into
feast-dev:master
May 3, 2026
25 of 26 checks passed
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.
What this PR does / why we need it:
When using
@batch_feature_viewwithTransformationMode.PYTHONin the Spark offline store,get_historical_features()fails withUNRESOLVED_COLUMNerrors. This occurs because the PIT join SQL reads directly from the rawbatch_sourceand expects transformed feature columns (e.g., aggregated outputs) to already exist in the source data. However, BFV transformations are only executed duringfeast materialize, not during offline retrieval.This PR introduces compute-on-read support for
BatchFeatureViewinSparkOfflineStore. Before generating the PIT join SQL, BFVs with a UDF are detected and their transformations are applied:feature_transformation.udf()(same function used during materialization)table_subqueryin the query context with the temp view nameThis enables reuse of BFV definitions during offline training without requiring pre-materialization or external ETL pipelines. The entire pipeline remains fully distributed in Spark.
Which issue(s) this PR fixes:
Fixes #6345
Checks
git commit -s)Testing Strategy
Added 7 unit tests covering:
table_subqueryreplaced with temp viewFeatureViewpasses through unchangedMisc
Changes:
sdk/python/feast/infra/offline_stores/contrib/spark_offline_store/spark.py:BatchFeatureViewimport_apply_bfv_transformations()helper functionget_historical_features()between query context construction and PIT join SQL generationsdk/python/tests/unit/infra/offline_stores/contrib/spark_offline_store/test_spark_bfv_compute_on_read.py(new):