perf(postgres): Optimizing feast offline Store for date-range multi-FV retrieval#6057

Vperiodt

What this PR does / why we need it:

Replaces the date-range multi-FV path in the Postgres offline store (previously base_entities + one LEFT JOIN LATERAL per feature view) with a set-based LOCF (Last Observation Carried Forward) implementation
Uses a single timeline: stack (UNION ALL of spine + feature rows), COUNT for group boundaries, FIRST_VALUE for forward-fill, then filter to spine and apply per-FV TTL.

Which issue(s) this PR fixes:

Fixes slow get_historical_features on the Postgres offline store for date-range retrieval with multiple feature views. The LATERAL/inequality joins had O(N×M) cost. This PR switches to a LOCF path: one stacked timeline of size L, sort O(L log L), and O(L) window passes, which makes total cost as O(L log L).

Misc

Scenario: get_historical_features(features=[...], start_date=2023-01-01, end_date=2023-01-07) with no entity DataFrame and two feature views:

driver_fv: entity driver_id, TTL 1 day, feature score
customer_fv: entity customer_id, TTL 0, feature amount

Inputs

Parameter	Value
`start_date`	2023-01-01
`end_date`	2023-01-07
`entity_df`	`None` (non-entity mode)
FVs	driver_fv (driver_id, score, TTL 1d), customer_fv (customer_id, amount, TTL 0)

1. Feature data window (`__data_raw`)

Feature data is explicitly pulled from lookback to end_date so LOCF can carry the last observation before start_date.

Time window used for feature data

FV	TTL		Time window
driver_fv	1 day		[lookback_start_date, end_date]
customer_fv	0		[start_date, end_date]

With lookback_start_date = start_date - max_ttl (e.g. 2022-12-31 for 1-day TTL), driver_fv rows before 2023-01-01 are now included so LOCF can fill correctly.

-- Per-FV __data_raw: feature rows from lookback_start_date..end_date
"driver_fv__data_raw" AS (
  SELECT "ts" AS event_timestamp, "driver_id", "score" AS "score"
  FROM "driver_table" AS sub
  WHERE "ts" BETWEEN '2023-01-01 00:00:00+00' - interval '1 day' AND '2023-01-07 00:00:00+00'
),
"customer_fv__data_raw" AS (
  SELECT "ts" AS event_timestamp, "customer_id", "amount" AS "amount"
  FROM "customer_table" AS sub
  WHERE "ts" BETWEEN '2023-01-01 00:00:00+00' AND '2023-01-07 00:00:00+00'
),

2. Spine: unified (entity, timestamp) grid

Spine is built from each FV’s __data / __data_raw over [start_date, end_date]. When FVs have different entity sets, the template emits NULL AS "entity" for the FV that doesn’t have that entity:

spine AS (
  SELECT DISTINCT d.event_timestamp, d."driver_id", NULL AS "customer_id"
  FROM "driver_fv__data_raw" d
  WHERE d.event_timestamp BETWEEN '2023-01-01 00:00:00+00' AND '2023-01-07 00:00:00+00'
  UNION
  SELECT DISTINCT d.event_timestamp, NULL AS "driver_id", d."customer_id"
  FROM "customer_fv__data_raw" d
  WHERE d.event_timestamp BETWEEN '2023-01-01 00:00:00+00' AND '2023-01-07 00:00:00+00'
),

Example spine shape (

Spine = distinct (event_timestamp, driver_id, customer_id) from both FVs; missing entity is NULL.

event_timestamp	driver_id	customer_id
2023-01-01 00:00:00+00	101	NULL
2023-01-01 00:00:00+00	NULL	201
2023-01-02 00:00:00+00	101	NULL
…	…	…

3. TTL in the final SELECT

-- driver_fv has TTL 86400 (1 day)
CASE WHEN (spine.event_timestamp - "driver_fv__f"."driver_fv__filled_ts") <= make_interval(secs => 86400)
     THEN "driver_fv__f"."score" ELSE NULL END AS "score",
-- customer_fv has TTL 0: no CASE
"customer_fv__f"."amount" AS "amount"

Example final output shape

After LOCF and TTL: one row per (event_timestamp, driver_id, customer_id); score is NULL when outside TTL.

event_timestamp	driver_id	customer_id	score	amount
2023-01-01 00:00:00+00	101	NULL	0.8	NULL
2023-01-01 00:00:00+00	NULL	201	NULL	99.5
2023-01-02 00:00:00+00	101	NULL	0.9	NULL
…	…	…	…	…

tokoko

@Vperiodt thanks for the PR. if it's not too much trouble, can you give a small example of result queries for say.. 2 fv scenario and highlight differences this PR introduces? there jinja templates are hard enough to read through even when one knows what the intent is 😄

Vperiodt

@Vperiodt thanks for the PR. if it's not too much trouble, can you give a small example of result queries for say.. 2 fv scenario and highlight differences this PR introduces? there jinja templates are hard enough to read through even when one knows what the intent is 😄

sure ! will add it in the description itself

YassinNouh21

The LOCF approach is a solid improvement over the LATERAL join path — O(L log L) window passes vs O(N×M) inequality joins is a real win for large date-range queries. The per-FV independent pipeline is clean and avoids cross-FV interference.

A few issues to address before merge — one pre-existing bug that this PR should fix since it's modifying the non-entity path, dead code in the template, and some test coverage gaps.

Vperiodt

hey ! @YassinNouh21 Thanks for your review earlier, it would be great if you can review it again

jyejare

@Vperiodt Could you please rebase the PR for integration tests run ?

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

jyejare

Review Summary

The LOCF (Last Observation Carried Forward) approach is a solid replacement for the LATERAL JOIN path — the per-FV independent forward-fill pipelines (__stacked → __grouped → __filled) correctly avoid cross-FV interference, and the edge cases are well handled (entityless FVs with PARTITION BY (SELECT NULL), different entity sets with NULL padding + IS NOT DISTINCT FROM, created_timestamp_column dedup via ROW_NUMBER). The make_interval(secs => ...) improvement applies to the existing entity_df path too, which is a nice bonus.

Test coverage is strong with 7 new test methods, all using sqlglot.parse validation.

However, CI is currently failing (lint-python and unit-test-python 3.10), so this is not merge-ready yet.

Blockers

CI failures — lint-python and unit-test-python (3.10, ubuntu-latest) are red on the latest run (Apr 13). Please investigate and fix.
Reuse compute_non_entity_date_range — as @ntkathole flagged, the inlined date-range logic should use the existing utility. This also eliminates the duplicate max_ttl_seconds computation. See inline comment for the suggested pattern.

Minor Items

Skip entity_df creation in LOCF path — the synthetic entity_df and dummy left_table_query_string are never used by the LOCF template. Consider skipping that overhead entirely.
The IS NOT DISTINCT FROM fix for NULL entity comparisons is an improvement over the old LATERAL path — worth calling out in the PR description as a correctness fix.

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Vperiodt marked this pull request as ready for review March 4, 2026 07:06

Vperiodt requested a review from a team as a code owner March 4, 2026 07:06

This comment was marked as resolved.

Sign in to view

Vperiodt marked this pull request as draft March 5, 2026 17:18

Vperiodt marked this pull request as ready for review March 9, 2026 21:20

This comment was marked as resolved.

Sign in to view

ntkathole mentioned this pull request Mar 10, 2026

feat: Add non-entity retrieval support for ClickHouse offline store #6066

Merged

5 tasks

YassinNouh21 reviewed Mar 15, 2026

View reviewed changes

YassinNouh21 mentioned this pull request Mar 15, 2026

fix(postgres): Use end_date in synthetic entity_df for non-entity retrieval #6110

Merged

3 tasks

jyejare added the ok-to-test label Apr 7, 2026

Vperiodt added 9 commits April 10, 2026 15:42

Optimizing Date-Range Queries …

309cdac

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

fix issues …

3da5af3

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

fix lint …

34eed9a

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

featureview entities …

a4139b7

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

fix LOCF spine UNION and empty-entity SQL …

74a474c

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

added fixes …

6f754b8

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

updates test …

7594205

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

fix entity_df range …

f770c41

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

updated postgres.py …

f235eb8

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Vperiodt force-pushed the patch-query branch from d96cee1 to f235eb8 Compare April 10, 2026 10:15

fix-lint …

8c66e6f

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

ntkathole reviewed Apr 13, 2026

View reviewed changes

jyejare mentioned this pull request Apr 16, 2026

feat: DuckDB historical retrieval without entity dataframe #6108

Open

jyejare reviewed Apr 16, 2026

View reviewed changes

Merge branch 'feast-dev:master' into patch-query

25bc86d

Vperiodt and others added 5 commits April 16, 2026 19:35

address reviews …

b6a4703

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Merge branch 'feast-dev:master' into patch-query

d014b09

lint-fix …

69565bb

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

Merge branch 'feast-dev:master' into patch-query

b5d7c8f

Merge branch 'master' into patch-query

4db3b40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(postgres): Optimizing feast offline Store for date-range multi-FV retrieval#6057