◐ Shell
reader mode source ↗
Skip to content

feat: Offline Store historical features retrieval based on datetime range in dask#5717

Merged
ntkathole merged 11 commits into
feast-dev:masterfrom
aniketpalu:RHOAIENG-37451
Dec 8, 2025
Merged

feat: Offline Store historical features retrieval based on datetime range in dask#5717
ntkathole merged 11 commits into
feast-dev:masterfrom
aniketpalu:RHOAIENG-37451

Conversation

@aniketpalu

@aniketpalu aniketpalu commented Nov 11, 2025

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

  • Adds start/end-only historical retrieval to Dask offline store, enabling users to fetch features over a time range without providing an entity_df.

  • Makes entity_df optional in DaskOfflineStore.get_historical_features and accepts start_date/end_date via kwargs.

  • In non-entity mode:

    • Defaults end_date to now (UTC); derives start_date from max TTL across requested feature views (fallback 30 days).
    • Synthesizes a minimal one-row entity_df with only event_timestamp, to reuse existing join/metadata flow without scanning sources.
    • Falls back to cross-join when join keys aren’t present in the synthetic entity_df, avoiding KeyError and producing a snapshot as-of end_date with TTL + dedup for correctness.
    • Adds a unit test for Dask non-entity retrieval to assert API acceptance and job construction.

Which issue(s) this PR fixes:

RHOAIENG-37451

Misc

…atatime range for Dask

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
@aniketpalu aniketpalu requested a review from a team as a code owner November 11, 2025 15:17
@aniketpalu aniketpalu changed the title RHOAIENG-37451:Offline Store historical features retrieval based on d… Nov 11, 2025

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Pull Request Overview

This PR adds non-entity mode historical feature retrieval to the Dask offline store, enabling users to retrieve features over a time range (start_date/end_date) without providing an entity_df.

Key changes:

  • Makes entity_df optional in DaskOfflineStore.get_historical_features and accepts start_date/end_date via kwargs
  • Synthesizes a minimal one-row entity_df with only the event_timestamp column to reuse existing join and metadata logic
  • Implements cross-join fallback when join keys are absent from the synthetic entity_df, relying on TTL filtering and deduplication for correctness

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
sdk/python/feast/infra/offline_stores/dask.py Implements non-entity mode by making entity_df optional, synthesizing a minimal entity_df when None, and using cross-join logic when join keys are absent
sdk/python/tests/unit/infra/offline_stores/test_dask_non_entity.py Adds unit test verifying that the API accepts start_date/end_date parameters in non-entity mode and returns a valid RetrievalJob

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@aniketpalu aniketpalu changed the title feat: Offline Store historical features retrieval based on d… Nov 13, 2025

@jyejare jyejare left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

ACK pending comments

@aniketpalu aniketpalu requested a review from jyejare November 25, 2025 09:19

@jyejare jyejare left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

LGTM

Hide details View details @ntkathole ntkathole merged commit a16582a into feast-dev:master Dec 8, 2025
17 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Dec 16, 2025
# [0.58.0](v0.57.0...v0.58.0) (2025-12-16)

### Bug Fixes

* Add java proto ([#5719](#5719)) ([fc3ea20](fc3ea20))
* Add possibility to force full features names for materialize ops ([#5728](#5728)) ([55c9c36](55c9c36))
* Fixed file registry cache sync ([09505d4](09505d4))
* Handle hyphon in sqlite project name ([#5575](#5575)) ([#5749](#5749)) ([b8346ff](b8346ff))
* Pinned substrait to fix protobuf issue ([d0ef4da](d0ef4da))
* Set TLS certificate annotation only on gRPC service ([#5715](#5715)) ([75d13db](75d13db))
* SQLite online store deletes tables from other projects in shared registry scenarios ([#5766](#5766)) ([fabce76](fabce76))
* Validate not existing entity join keys for preventing panic ([0b93559](0b93559))

### Features

* Add annotations for pod templates ([534e647](534e647))
* Add Pytorch template ([#5780](#5780)) ([6afd353](6afd353))
* Add support for extra options for stream source ([#5618](#5618)) ([18956c2](18956c2))
* Added matched_tag field search api results with fuzzy search capabilities ([#5769](#5769)) ([4a9ffae](4a9ffae))
* Added support for enabling metrics in Feast Operator ([#5317](#5317)) ([#5748](#5748)) ([a8498c2](a8498c2))
* Configure CacheTTLSecondscache,CacheMode for file-based registry in Feast Operator([#5708](#5708)) ([#5744](#5744)) ([f25f83b](f25f83b))
* Implemented Tiling Support for Time-Windowed Aggregations ([#5724](#5724)) ([7a99166](7a99166))
* Offline Store historical features retrieval based on datetime range for spark ([#5720](#5720)) ([27ec8ec](27ec8ec))
* Offline Store historical features retrieval based on datetime range in dask ([#5717](#5717)) ([a16582a](a16582a))
* Production ready feast operator with v1 apiversion ([#5771](#5771)) ([49359c6](49359c6))
* Support for Map value data type ([#5768](#5768)) ([#5772](#5772)) ([b99a8a9](b99a8a9))
antznette1 pushed a commit to antznette1/feast that referenced this pull request Jan 3, 2026
# [0.58.0](feast-dev/feast@v0.57.0...v0.58.0) (2025-12-16)

### Bug Fixes

* Add java proto ([feast-dev#5719](feast-dev#5719)) ([fc3ea20](feast-dev@fc3ea20))
* Add possibility to force full features names for materialize ops ([feast-dev#5728](feast-dev#5728)) ([55c9c36](feast-dev@55c9c36))
* Fixed file registry cache sync ([09505d4](feast-dev@09505d4))
* Handle hyphon in sqlite project name ([feast-dev#5575](feast-dev#5575)) ([feast-dev#5749](feast-dev#5749)) ([b8346ff](feast-dev@b8346ff))
* Pinned substrait to fix protobuf issue ([d0ef4da](feast-dev@d0ef4da))
* Set TLS certificate annotation only on gRPC service ([feast-dev#5715](feast-dev#5715)) ([75d13db](feast-dev@75d13db))
* SQLite online store deletes tables from other projects in shared registry scenarios ([feast-dev#5766](feast-dev#5766)) ([fabce76](feast-dev@fabce76))
* Validate not existing entity join keys for preventing panic ([0b93559](feast-dev@0b93559))

### Features

* Add annotations for pod templates ([534e647](feast-dev@534e647))
* Add Pytorch template ([feast-dev#5780](feast-dev#5780)) ([6afd353](feast-dev@6afd353))
* Add support for extra options for stream source ([feast-dev#5618](feast-dev#5618)) ([18956c2](feast-dev@18956c2))
* Added matched_tag field search api results with fuzzy search capabilities ([feast-dev#5769](feast-dev#5769)) ([4a9ffae](feast-dev@4a9ffae))
* Added support for enabling metrics in Feast Operator ([feast-dev#5317](feast-dev#5317)) ([feast-dev#5748](feast-dev#5748)) ([a8498c2](feast-dev@a8498c2))
* Configure CacheTTLSecondscache,CacheMode for file-based registry in Feast Operator([feast-dev#5708](feast-dev#5708)) ([feast-dev#5744](feast-dev#5744)) ([f25f83b](feast-dev@f25f83b))
* Implemented Tiling Support for Time-Windowed Aggregations ([feast-dev#5724](feast-dev#5724)) ([7a99166](feast-dev@7a99166))
* Offline Store historical features retrieval based on datetime range for spark ([feast-dev#5720](feast-dev#5720)) ([27ec8ec](feast-dev@27ec8ec))
* Offline Store historical features retrieval based on datetime range in dask ([feast-dev#5717](feast-dev#5717)) ([a16582a](feast-dev@a16582a))
* Production ready feast operator with v1 apiversion ([feast-dev#5771](feast-dev#5771)) ([49359c6](feast-dev@49359c6))
* Support for Map value data type ([feast-dev#5768](feast-dev#5768)) ([feast-dev#5772](feast-dev#5772)) ([b99a8a9](feast-dev@b99a8a9))

Signed-off-by: Anthonette Adanyin <106275232+antznette1@users.noreply.github.com>
franciscojavierarceo pushed a commit that referenced this pull request Jan 5, 2026
# [0.58.0](v0.57.0...v0.58.0) (2025-12-16)

### Bug Fixes

* Add java proto ([#5719](#5719)) ([fc3ea20](fc3ea20))
* Add possibility to force full features names for materialize ops ([#5728](#5728)) ([55c9c36](55c9c36))
* Fixed file registry cache sync ([09505d4](09505d4))
* Handle hyphon in sqlite project name ([#5575](#5575)) ([#5749](#5749)) ([b8346ff](b8346ff))
* Pinned substrait to fix protobuf issue ([d0ef4da](d0ef4da))
* Set TLS certificate annotation only on gRPC service ([#5715](#5715)) ([75d13db](75d13db))
* SQLite online store deletes tables from other projects in shared registry scenarios ([#5766](#5766)) ([fabce76](fabce76))
* Validate not existing entity join keys for preventing panic ([0b93559](0b93559))

### Features

* Add annotations for pod templates ([534e647](534e647))
* Add Pytorch template ([#5780](#5780)) ([6afd353](6afd353))
* Add support for extra options for stream source ([#5618](#5618)) ([18956c2](18956c2))
* Added matched_tag field search api results with fuzzy search capabilities ([#5769](#5769)) ([4a9ffae](4a9ffae))
* Added support for enabling metrics in Feast Operator ([#5317](#5317)) ([#5748](#5748)) ([a8498c2](a8498c2))
* Configure CacheTTLSecondscache,CacheMode for file-based registry in Feast Operator([#5708](#5708)) ([#5744](#5744)) ([f25f83b](f25f83b))
* Implemented Tiling Support for Time-Windowed Aggregations ([#5724](#5724)) ([7a99166](7a99166))
* Offline Store historical features retrieval based on datetime range for spark ([#5720](#5720)) ([27ec8ec](27ec8ec))
* Offline Store historical features retrieval based on datetime range in dask ([#5717](#5717)) ([a16582a](a16582a))
* Production ready feast operator with v1 apiversion ([#5771](#5771)) ([49359c6](49359c6))
* Support for Map value data type ([#5768](#5768)) ([#5772](#5772)) ([b99a8a9](b99a8a9))

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants