{{ message }}
feat(spark): SparkSource query+path and pre-computed offline read for BatchFeatureView#6440
Open
abhijeet-dhumal wants to merge 8 commits into
Open
Conversation
a98e23b to
57b2489
Compare
May 27, 2026 14:50
57b2489 to
e30e146
Compare
May 29, 2026 08:57
abhijeet-dhumal
added a commit
to abhijeet-dhumal/feast
that referenced
this pull request
May 29, 2026
…orical_features The function and its call were removed in this PR but the replacement (_apply_bfv_transformations_for_historical) lives in a separate PR (feast-dev#6440). Removing it here would silently return raw untransformed features for any BatchFeatureView with a Python UDF via the standard get_historical_features() API path (FeatureStore → passthrough_provider → SparkOfflineStore). Restoring the function and its call until feast-dev#6440 lands. Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
SparkSource previously required exactly one of table/query/path. This relaxes the constraint to allow query + path together: - query: used for reading raw data during materialization - path: used for offline write-back (offline=True) and as pre-computed read source in get_historical_features Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
… get_historical_features Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
abhijeet-dhumal
added a commit
to abhijeet-dhumal/feast
that referenced
this pull request
Jun 1, 2026
…orical_features The function and its call were removed in this PR but the replacement (_apply_bfv_transformations_for_historical) lives in a separate PR (feast-dev#6440). Removing it here would silently return raw untransformed features for any BatchFeatureView with a Python UDF via the standard get_historical_features() API path (FeatureStore → passthrough_provider → SparkOfflineStore). Restoring the function and its call until feast-dev#6440 lands. Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
e30e146 to
4316349
Compare
June 1, 2026 07:48
jyejare
reviewed
Jun 1, 2026
jyejare
left a comment
Collaborator
There was a problem hiding this comment.
This PR adds support for SparkSource with combined query+path configuration and pre-computed offline reads for BatchFeatureView. The changes enable reading from materialized offline stores to avoid expensive UDF re-execution. While the feature is useful, there are several security vulnerabilities and error handling gaps that need attention.
Sorry, something went wrong.
4316349 to
5312075
Compare
June 2, 2026 12:45
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
Catch FileNotFoundError and PermissionError separately for the expected fallback cases (path not yet materialized, or no access). Unexpected errors now emit a distinct RuntimeWarning instead of being silently swallowed by a bare except Exception. Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
jyejare
approved these changes
Jun 2, 2026
2f6910e to
9eb3d29
Compare
June 3, 2026 08:03
ntkathole
reviewed
Jun 3, 2026
ntkathole
reviewed
Jun 3, 2026
ntkathole
reviewed
Jun 3, 2026
The 1.5x speedup assertion for convert_response_to_dict is consistently flaky on macOS CI runners (getting 1.26-1.34x) due to variable load. 1.2x is still a meaningful regression guard without being brittle. Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
9eb3d29 to
e7fc883
Compare
June 9, 2026 07:35
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.
What this PR does / why we need it
get_historical_features()on aBatchFeatureViewre-runs the full UDF on raw data every call. For embedding pipelines, that's 20–40 min of compute per training run even though features already exist from the lastmaterialize.Fix: Route
get_historical_features()to read pre-computed parquet frombatch_source.pathinstead of re-executing the UDF.To support this,
SparkSourcenow acceptsquery + pathtogether:query— raw data read duringmaterialize()path— write-back target and pre-computed read source forget_historical_features()Also allows
BatchFeatureViewwithonline=False, offline=True(offline-only) to skip the online validation check inget_historical_features(), so it can be used purely for training data without configuring an online store.Falls back to live query if
pathdoesn't exist yet (first run before any materialization).Which issue(s) this PR fixes
N/A. Enables efficient training data retrieval for
BatchFeatureViewembedding pipelines without re-running UDFs.Checks
git commit -s)Testing Strategy
get_historical_features()reads from parquet, not UDF, after materialization