◐ Shell
reader mode source ↗
Skip to content

feat: Add non-entity retrieval support for ClickHouse offline store#6066

Merged
ntkathole merged 1 commit into
feast-dev:masterfrom
YassinNouh21:feat/clickhouse-non-entity-retrieval
Mar 10, 2026
Merged

feat: Add non-entity retrieval support for ClickHouse offline store#6066
ntkathole merged 1 commit into
feast-dev:masterfrom
YassinNouh21:feat/clickhouse-non-entity-retrieval

Conversation

@YassinNouh21

@YassinNouh21 YassinNouh21 commented Mar 5, 2026

Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:

Adds support for non-entity historical retrieval (entity_df=None) in the ClickHouse offline store, bringing it to parity with the PostgreSQL offline store.

Changes:

  • Updated ClickhouseOfflineStore.get_historical_features() to accept entity_df=None with optional start_date/end_date kwargs
  • When entity_df is None, a synthetic single-row DataFrame is created using the provided date range (or sensible defaults: end_date=now, start_date derived from max TTL or 30 days)
  • Added 3 unit tests covering: both dates provided, end_date only (start from TTL), and no dates (defaults to now)

Usage:

fs.get_historical_features(
    features=["driver_stats:conv_rate"],
    entity_df=None,
    start_date=datetime(2023, 1, 1),
    end_date=datetime(2023, 6, 1),
)

Which issue(s) this PR fixes:

Fixes #5835

Test plan

  • Unit tests pass (pytest tests/unit/infra/offline_stores/test_clickhouse.py)
  • Ruff lint and format checks pass
  • E2E verification of full FeatureStore -> Provider -> ClickhouseOfflineStore chain
  • No regression on existing entity_df mode
  • DCO sign-off included

Open with Devin

@YassinNouh21 YassinNouh21 requested a review from a team as a code owner March 5, 2026 01:34
@YassinNouh21 YassinNouh21 self-assigned this Mar 5, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@YassinNouh21 YassinNouh21 force-pushed the feat/clickhouse-non-entity-retrieval branch 3 times, most recently from b3f62c1 to 1e4cd7c Compare March 5, 2026 01:55
devin-ai-integration[bot]

This comment was marked as resolved.

@franciscojavierarceo

Copy link
Copy Markdown
Member

Why not have an integration test?

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

@YassinNouh21 YassinNouh21 force-pushed the feat/clickhouse-non-entity-retrieval branch from 8fd4f11 to 8dd703c Compare March 5, 2026 13:34
@YassinNouh21

Copy link
Copy Markdown
Collaborator Author

@franciscojavierarceo @ntkathole take another look !

Enable get_historical_features() to be called with entity_df=None
by passing end_date kwarg instead. When entity_df is None, a synthetic
single-row DataFrame is created using end_date (defaults to now).
The PIT join window is controlled by end_date and TTL.

Includes integration test against a real ClickHouse container.

Fixes feast-dev#5835

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
@ntkathole ntkathole force-pushed the feat/clickhouse-non-entity-retrieval branch from 4662b91 to ee93086 Compare March 10, 2026 02:56
Hide details View details @ntkathole ntkathole merged commit 4d08ddc into feast-dev:master Mar 10, 2026
21 of 25 checks passed
YassinNouh21

This comment was marked as outdated.

YassinNouh21 added a commit to YassinNouh21/feast that referenced this pull request Mar 15, 2026
…_df for non-entity retrieval

The non-entity retrieval path created a synthetic entity_df using
pd.date_range(start=start_date, ...)[:1], which placed start_date as
the event_timestamp. Since PIT joins use MAX(entity_timestamp) as the
upper bound for feature data filtering, using start_date made end_date
unreachable — no features after start_date would be returned.

Fix: use [end_date] directly, matching the ClickHouse implementation
(PR feast-dev#6066) and the Dask offline store behavior.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
ntkathole pushed a commit to YassinNouh21/feast that referenced this pull request Mar 16, 2026
…_df for non-entity retrieval

The non-entity retrieval path created a synthetic entity_df using
pd.date_range(start=start_date, ...)[:1], which placed start_date as
the event_timestamp. Since PIT joins use MAX(entity_timestamp) as the
upper bound for feature data filtering, using start_date made end_date
unreachable — no features after start_date would be returned.

Fix: use [end_date] directly, matching the ClickHouse implementation
(PR feast-dev#6066) and the Dask offline store behavior.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
franciscojavierarceo added a commit that referenced this pull request Mar 17, 2026
…rieval (#6110)

* fix(postgres): Use end_date instead of start_date in synthetic entity_df for non-entity retrieval

The non-entity retrieval path created a synthetic entity_df using
pd.date_range(start=start_date, ...)[:1], which placed start_date as
the event_timestamp. Since PIT joins use MAX(entity_timestamp) as the
upper bound for feature data filtering, using start_date made end_date
unreachable — no features after start_date would be returned.

Fix: use [end_date] directly, matching the ClickHouse implementation
(PR #6066) and the Dask offline store behavior.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix: preserve timestamp range for min_event_timestamp and fix formatting

The entity_df fix alone would cause min_event_timestamp to be computed
as end_date - TTL (instead of start_date - TTL), clipping valid data
from the query window. Override entity_df_event_timestamp_range to
(start_date, end_date) in non-entity mode so the full range is used.

Also fix ruff formatting in the test file.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* test: add integration test for non-entity retrieval

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

---------

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
Anarion-zuo pushed a commit to Anarion-zuo/feast that referenced this pull request Mar 17, 2026
…rieval (feast-dev#6110)

* fix(postgres): Use end_date instead of start_date in synthetic entity_df for non-entity retrieval

The non-entity retrieval path created a synthetic entity_df using
pd.date_range(start=start_date, ...)[:1], which placed start_date as
the event_timestamp. Since PIT joins use MAX(entity_timestamp) as the
upper bound for feature data filtering, using start_date made end_date
unreachable — no features after start_date would be returned.

Fix: use [end_date] directly, matching the ClickHouse implementation
(PR feast-dev#6066) and the Dask offline store behavior.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix: preserve timestamp range for min_event_timestamp and fix formatting

The entity_df fix alone would cause min_event_timestamp to be computed
as end_date - TTL (instead of start_date - TTL), clipping valid data
from the query window. Override entity_df_event_timestamp_range to
(start_date, end_date) in non-entity mode so the full range is used.

Also fix ruff formatting in the test file.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* test: add integration test for non-entity retrieval

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

---------

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
Signed-off-by: aaronzuo <anarionzuo@outlook.com>
Shizoqua pushed a commit to Shizoqua/feast that referenced this pull request Mar 18, 2026
…rieval (feast-dev#6110)

* fix(postgres): Use end_date instead of start_date in synthetic entity_df for non-entity retrieval

The non-entity retrieval path created a synthetic entity_df using
pd.date_range(start=start_date, ...)[:1], which placed start_date as
the event_timestamp. Since PIT joins use MAX(entity_timestamp) as the
upper bound for feature data filtering, using start_date made end_date
unreachable — no features after start_date would be returned.

Fix: use [end_date] directly, matching the ClickHouse implementation
(PR feast-dev#6066) and the Dask offline store behavior.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix: preserve timestamp range for min_event_timestamp and fix formatting

The entity_df fix alone would cause min_event_timestamp to be computed
as end_date - TTL (instead of start_date - TTL), clipping valid data
from the query window. Override entity_df_event_timestamp_range to
(start_date, end_date) in non-entity mode so the full range is used.

Also fix ruff formatting in the test file.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* test: add integration test for non-entity retrieval

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

---------

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
Signed-off-by: Shizoqua <hr.lanreshittu@gmail.com>
aniketpalu pushed a commit to aniketpalu/feast that referenced this pull request Mar 23, 2026
…rieval (feast-dev#6110)

* fix(postgres): Use end_date instead of start_date in synthetic entity_df for non-entity retrieval

The non-entity retrieval path created a synthetic entity_df using
pd.date_range(start=start_date, ...)[:1], which placed start_date as
the event_timestamp. Since PIT joins use MAX(entity_timestamp) as the
upper bound for feature data filtering, using start_date made end_date
unreachable — no features after start_date would be returned.

Fix: use [end_date] directly, matching the ClickHouse implementation
(PR feast-dev#6066) and the Dask offline store behavior.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix: preserve timestamp range for min_event_timestamp and fix formatting

The entity_df fix alone would cause min_event_timestamp to be computed
as end_date - TTL (instead of start_date - TTL), clipping valid data
from the query window. Override entity_df_event_timestamp_range to
(start_date, end_date) in non-entity mode so the full range is used.

Also fix ruff formatting in the test file.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* test: add integration test for non-entity retrieval

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

---------

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
yuan1j pushed a commit to yuan1j/feast that referenced this pull request Apr 2, 2026
…rieval (feast-dev#6110)

* fix(postgres): Use end_date instead of start_date in synthetic entity_df for non-entity retrieval

The non-entity retrieval path created a synthetic entity_df using
pd.date_range(start=start_date, ...)[:1], which placed start_date as
the event_timestamp. Since PIT joins use MAX(entity_timestamp) as the
upper bound for feature data filtering, using start_date made end_date
unreachable — no features after start_date would be returned.

Fix: use [end_date] directly, matching the ClickHouse implementation
(PR feast-dev#6066) and the Dask offline store behavior.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix: preserve timestamp range for min_event_timestamp and fix formatting

The entity_df fix alone would cause min_event_timestamp to be computed
as end_date - TTL (instead of start_date - TTL), clipping valid data
from the query window. Override entity_df_event_timestamp_range to
(start_date, end_date) in non-entity mode so the full range is used.

Also fix ruff formatting in the test file.

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* test: add integration test for non-entity retrieval

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

---------

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
Signed-off-by: yuanjun220 <1069645408@qq.com>
franciscojavierarceo pushed a commit that referenced this pull request Apr 7, 2026
# [0.61.0](v0.60.0...v0.61.0) (2026-04-07)

### Bug Fixes

* Add grpcio dependency group to transformation server Dockerfile ([2c2150a](2c2150a))
* Add https readiness check for rest-registry tests ([ea85e63](ea85e63))
* Add website build check for PRs and fix blog frontmatter YAML error ([#6079](#6079)) ([30a3a43](30a3a43))
* Added missing jackc/pgx/v5 entries ([94ad0e7](94ad0e7))
* Added MLflow metric charts across feature selection ([#6080](#6080)) ([a403361](a403361))
* Check duplicate names for feature view across types ([#5999](#5999)) ([95b9af8](95b9af8))
* Fix integration tests ([#6046](#6046)) ([02d5548](02d5548))
* Fix missing error handling for resource_counts endpoint ([d9706ce](d9706ce))
* Fix non-specific label selector on metrics service ([a1a160d](a1a160d))
* fix path feature_definitions.py ([7d7df68](7d7df68))
* Fix regstry Rest API tests intermittent failure ([d53a339](d53a339))
* Fixed IntegrityError on SqlRegistry ([#6047](#6047)) ([325e148](325e148))
* Fixed intermittent failures in get_historical_features ([c335ec7](c335ec7))
* Fixed pre-commit check ([114b7db](114b7db))
* Fixed the intermittent FeatureViewNotFoundException ([661ecc7](661ecc7))
* Fixed uv cache permission error for docker build on mac ([ad807be](ad807be))
* Fixes a `PydanticDeprecatedSince20` warning for trino_offline_store ([#5991](#5991)) ([abfd18a](abfd18a))
* Handle existing RBAC role gracefully in namespace registry ([b46a62b](b46a62b))
* Ignore ipynb files during apply ([#6151](#6151)) ([4ea123d](4ea123d))
* Integration test failures ([#6040](#6040)) ([9165870](9165870))
* Mount TLS volumes for init container ([080a9b5](080a9b5))
* **postgres:** Use end_date in synthetic entity_df for non-entity retrieval ([#6110](#6110)) ([088a802](088a802)), closes [#6066](#6066)
* Ray offline store tests are duplicated across 3 workflows ([54f705a](54f705a))
* Reenable tests ([#6036](#6036)) ([82ee7f8](82ee7f8))
* SSL/TLS mode by default for postgres connection ([4844488](4844488))
* Use commitlint pre-commit hook instead of a separate action ([35a81e7](35a81e7))

### Features

* Add Claude Code agent skills for Feast ([#6081](#6081)) ([1e5b60f](1e5b60f)), closes [#5976](#5976) [#6007](#6007)
* Add complex type support (Map, JSON, Struct) with schema validation ([#5974](#5974)) ([1200dbf](1200dbf))
* Add decimal to supported feature types ([#6029](#6029)) ([#6226](#6226)) ([cff6fbf](cff6fbf))
* Add feast apply init container to automate registry population on pod start ([#6106](#6106)) ([6b31a43](6b31a43))
* Add feature view versioning support to PostgreSQL and MySQL online stores ([#6193](#6193)) ([940e0f0](940e0f0)), closes [#6168](#6168) [#6169](#6169) [#2728](#2728)
* Add materialization, feature freshness, request latency, and push metrics to feature server ([2c6be18](2c6be18))
* Add metadata statistics to registry api ([ef1d4fc](ef1d4fc))
* Add non-entity retrieval support for ClickHouse offline store ([4d08ddc](4d08ddc)), closes [#5835](#5835)
* Add OnlineStore for MongoDB ([#6025](#6025)) ([bf4e3fa](bf4e3fa)), closes [golang/go#74462](golang/go#74462)
* Add Oracle DB as Offline store in python sdk & operator ([#6017](#6017)) ([9d35368](9d35368))
* Add RBAC aggregation labels to FeatureStore ClusterRoles ([daf77c6](daf77c6))
* Add ServiceMonitor auto-generation for Prometheus discovery ([#6126](#6126)) ([56e6d21](56e6d21))
* Add typed_features field to grpc write request (([#6117](#6117)) ([#6118](#6118)) ([eeaa6db](eeaa6db)), closes [#6116](#6116)
* Add UUID and TIME_UUID as feature types ([#5885](#5885)) ([#5951](#5951)) ([5d6e311](5d6e311))
* Add version indicators to lineage graph nodes ([#6187](#6187)) ([73805d3](73805d3))
* Add version tracking to FeatureView ([#6101](#6101)) ([ed4a4f2](ed4a4f2))
* Added Agent skills for AI Agents ([#6007](#6007)) ([99008c8](99008c8))
* Added CodeQL SAST scanning and detect-secrets pre-commit hook ([547b516](547b516))
* Added odfv transformations metrics ([8b5a526](8b5a526))
* Adding optional name to Aggregation (feast-dev[#5994](#5994)) ([#6083](#6083)) ([56469f7](56469f7))
* Created DocEmbedder class ([#5973](#5973)) ([0719c06](0719c06))
* Extended OIDC support to extract groups & namespaces and token injection with multiple methods ([#6089](#6089)) ([7c04026](7c04026))
* Feature Server High-Availability on Kubernetes ([#6028](#6028)) ([9c07b4c](9c07b4c)), closes [Hi#Availability](https://github.com/Hi/issues/Availability) [Hi#Availability](https://github.com/Hi/issues/Availability)
* **go:** Implement metrics and tracing for http and grpc servers ([#5925](#5925)) ([2b4ec9a](2b4ec9a))
* Horizontal scaling support to the Feast operator ([#6000](#6000)) ([3ec13e6](3ec13e6))
* Making feature view source optional (feast-dev[#6074](#6074)) ([#6075](#6075)) ([76917b7](76917b7))
* Replace ORJSONResponse with Pydantic response models for faster JSON serialization ([65cf03c](65cf03c))
* Support arm docker build ([#6061](#6061)) ([1e1f5d9](1e1f5d9))
* Support distinct count aggregation [[#6116](#6116)] ([3639570](3639570))
* Support HTTP in MCP ([#6109](#6109)) ([e72b983](e72b983))
* Support nested collection types (Array/Set of Array/Set) ([#5947](#5947)) ([#6132](#6132)) ([ab61642](ab61642))
* Support podAnnotations on Deployment pod template ([1b3cdc1](1b3cdc1))
* Use orjson for faster JSON serialization in feature server ([6f5203a](6f5203a))
* Utilize date partition column in BigQuery ([#6076](#6076)) ([4ea9b32](4ea9b32))

### Performance Improvements

* Online feature response construction in a single pass over read rows ([113fb04](113fb04))
* Optimize protobuf parsing in Redis online store ([#6023](#6023)) ([59dfdb8](59dfdb8))
* Optimize timestamp conversion in _convert_rows_to_protobuf ([33a2e95](33a2e95))
* Parallelize DynamoDB batch reads in sync online_read ([#6024](#6024)) ([9699944](9699944))
* Remove redundant entity key serialization in online_read ([d87283f](d87283f))
franciscojavierarceo pushed a commit that referenced this pull request Apr 8, 2026
# [0.62.0](v0.61.0...v0.62.0) (2026-04-08)

### Bug Fixes

* Added missing jackc/pgx/v5 entries ([94ad0e7](94ad0e7))
* Fix missing error handling for resource_counts endpoint ([d9706ce](d9706ce))
* fix path feature_definitions.py ([7d7df68](7d7df68))
* Fix regstry Rest API tests intermittent failure ([d53a339](d53a339))
* Fixed intermittent failures in get_historical_features ([c335ec7](c335ec7))
* Fixed the intermittent FeatureViewNotFoundException ([661ecc7](661ecc7))
* Handle existing RBAC role gracefully in namespace registry ([b46a62b](b46a62b))
* Ignore ipynb files during apply ([#6151](#6151)) ([4ea123d](4ea123d))
* Mount TLS volumes for init container ([080a9b5](080a9b5))
* **postgres:** Use end_date in synthetic entity_df for non-entity retrieval ([#6110](#6110)) ([088a802](088a802)), closes [#6066](#6066)
* SSL/TLS mode by default for postgres connection ([4844488](4844488))
* Sync v0.61-branch so v0.61.0 tag is reachable from master ([af66878](af66878))

### Features

* Add Claude Code agent skills for Feast ([#6081](#6081)) ([1e5b60f](1e5b60f)), closes [#5976](#5976) [#6007](#6007)
* Add decimal to supported feature types ([#6029](#6029)) ([#6226](#6226)) ([cff6fbf](cff6fbf))
* Add feast apply init container to automate registry population on pod start ([#6106](#6106)) ([6b31a43](6b31a43))
* Add feature view versioning support to PostgreSQL and MySQL online stores ([#6193](#6193)) ([940e0f0](940e0f0)), closes [#6168](#6168) [#6169](#6169) [#2728](#2728)
* Add metadata statistics to registry api ([ef1d4fc](ef1d4fc))
* Add Oracle DB as Offline store in python sdk & operator ([#6017](#6017)) ([9d35368](9d35368))
* Add RBAC aggregation labels to FeatureStore ClusterRoles ([daf77c6](daf77c6))
* Add ServiceMonitor auto-generation for Prometheus discovery ([#6126](#6126)) ([56e6d21](56e6d21))
* Add typed_features field to grpc write request (([#6117](#6117)) ([#6118](#6118)) ([eeaa6db](eeaa6db)), closes [#6116](#6116)
* Add UUID and TIME_UUID as feature types ([#5885](#5885)) ([#5951](#5951)) ([5d6e311](5d6e311))
* Add version indicators to lineage graph nodes ([#6187](#6187)) ([73805d3](73805d3))
* Add version tracking to FeatureView ([#6101](#6101)) ([ed4a4f2](ed4a4f2))
* Added Agent skills for AI Agents ([#6007](#6007)) ([99008c8](99008c8))
* Added odfv transformations metrics ([8b5a526](8b5a526))
* Created DocEmbedder class ([#5973](#5973)) ([0719c06](0719c06))
* Extended OIDC support to extract groups & namespaces and token injection with multiple methods ([#6089](#6089)) ([7c04026](7c04026))
* Replace ORJSONResponse with Pydantic response models for faster JSON serialization ([65cf03c](65cf03c))
* Support distinct count aggregation [[#6116](#6116)] ([3639570](3639570))
* Support HTTP in MCP ([#6109](#6109)) ([e72b983](e72b983))
* Support nested collection types (Array/Set of Array/Set) ([#5947](#5947)) ([#6132](#6132)) ([ab61642](ab61642))
* Support podAnnotations on Deployment pod template ([1b3cdc1](1b3cdc1))
* Utilize date partition column in BigQuery ([#6076](#6076)) ([4ea9b32](4ea9b32))

### Performance Improvements

* Online feature response construction in a single pass over read rows ([113fb04](113fb04))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ClickHouse - historical retrieval without entity dataframe

3 participants