◐ Shell
reader mode source ↗
Skip to content

feat: Add retrieve online documents v2 method into pgvector #5253

Merged
franciscojavierarceo merged 11 commits into
feast-dev:masterfrom
YassinNouh21:feat/pgvector-retrieve-online-documents-v2
Apr 11, 2025
Merged

feat: Add retrieve online documents v2 method into pgvector #5253
franciscojavierarceo merged 11 commits into
feast-dev:masterfrom
YassinNouh21:feat/pgvector-retrieve-online-documents-v2

Conversation

@YassinNouh21

Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:

This PR enhances the PostgreSQL online store to support hybrid search capabilities, combining both vector similarity search and full-text search.

Specifically:

  • Introduces the ability to perform hybrid queries using both embeddings and keyword-based search.
  • Extends retrieve_online_documents_v2 function to handle vector-only, text-only, and hybrid cases gracefully.
  • Improves feature retrieval by dynamically selecting features based on query type (distance, text_rank).
  • Adds comprehensive integration tests to validate:
    • Vector similarity search (L2 and cosine distance)
    • Full-text search
    • Hybrid search (vector + text)
    • Edge cases (non-matching queries, category filtering)

This update supports the broader goal of enabling more intelligent, contextual document retrieval in Feast's online stores.

Which issue(s) this PR fixes:

Fixes #5115
Part of the roadmap to Introduce Feast NLP/LLM Add-On, enabling advanced search capabilities in vector databases.

Misc

@YassinNouh21 YassinNouh21 requested a review from a team as a code owner April 8, 2025 23:18
Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
@YassinNouh21 YassinNouh21 force-pushed the feat/pgvector-retrieve-online-documents-v2 branch from a17a7fa to 776c327 Compare April 8, 2025 23:18
@YassinNouh21 YassinNouh21 changed the title feat: add retrieve online documents v2 method into pgvector Apr 8, 2025
@YassinNouh21 YassinNouh21 changed the title feat:Add retrieve online documents v2 method into pgvector Apr 8, 2025
…d requested features

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
@YassinNouh21 YassinNouh21 requested a review from ntkathole April 9, 2025 10:04
@YassinNouh21

Copy link
Copy Markdown
Collaborator Author

@ntkathole can u take a quick look

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
@YassinNouh21 YassinNouh21 force-pushed the feat/pgvector-retrieve-online-documents-v2 branch from 1cc3f0e to 55dec54 Compare April 9, 2025 16:10
@YassinNouh21

Copy link
Copy Markdown
Collaborator Author

@franciscojavierarceo done take a look

@ntkathole ntkathole left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Looks good to me!

@YassinNouh21

YassinNouh21 commented Apr 9, 2025

Copy link
Copy Markdown
Collaborator Author

@franciscojavierarceo I think the reason behind the failed of the ci is this
from line 222 to line 225
at this file in sdk/python/tests/integration/online_store/test_universal_online.py

 sdk/python/tests/integration/online_store/test_universal_online.py
         # writes to online store via datasource (dataframe_source) materialization
         fs.materialize(
-            start_date=datetime.datetime.now() - timedelta(hours=12),
+            start_date=datetime.now() - timedelta(hours=12),
             end_date=_utc_now(),
         )

because it is irrelevant to the pr changed files

@franciscojavierarceo franciscojavierarceo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

thanks for this! need to change the import or update the integration test

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
@YassinNouh21

Copy link
Copy Markdown
Collaborator Author

@franciscojavierarceo we are ok to merge

@franciscojavierarceo franciscojavierarceo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

🚀🚀🚀

@franciscojavierarceo franciscojavierarceo merged commit 6770ee6 into feast-dev:master Apr 11, 2025
@YassinNouh21 YassinNouh21 deleted the feat/pgvector-retrieve-online-documents-v2 branch April 11, 2025 07:16
tchughesiv pushed a commit to tchughesiv/feast that referenced this pull request Apr 14, 2025
…v#5253)

* feat: add online document retrieval with hybrid search capabilities

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* test: add integration tests for hybrid search and document retrieval

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix formatting

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix: Refactor string_fields assignment to filter features by dtype and requested features

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix: improve query execution logic in postgres.py

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix linter

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix: simplify sorting logic in query execution

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix formatting

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix: update string feature check to use ValueType enumeration

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* formatting

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix datetime

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

---------

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
jfw-ppi pushed a commit to jfw-ppi/feast that referenced this pull request Jun 7, 2025
…v#5253)

* feat: add online document retrieval with hybrid search capabilities

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* test: add integration tests for hybrid search and document retrieval

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix formatting

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

* fix: Refactor string_fields assignment to filter features by dtype and requested features

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix: improve query execution logic in postgres.py

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix linter

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix: simplify sorting logic in query execution

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix formatting

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix: update string feature check to use ValueType enumeration

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* formatting

Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>

* fix datetime

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>

---------

Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
Signed-off-by: Yassin Nouh <70436855+YassinNouh21@users.noreply.github.com>
Signed-off-by: Jacob Weinhold <29459386+j-wine@users.noreply.github.com>
jfw-ppi pushed a commit to jfw-ppi/feast that referenced this pull request Jun 7, 2025
# [0.49.0](feast-dev/feast@v0.48.0...v0.49.0) (2025-04-29)

### Bug Fixes

* Adding brackets to unit tests ([c46fea3](feast-dev@c46fea3))
* Adding logic back for a step ([2bb240b](feast-dev@2bb240b))
* Adjustment for unit test action ([a6f78ae](feast-dev@a6f78ae))
* Allow get_historical_features with only On Demand Feature View ([feast-dev#5256](feast-dev#5256)) ([0752795](feast-dev@0752795))
* CI adjustment ([3850643](feast-dev@3850643))
* Embed Query configuration breaks when switching between DataFrame and SQL ([feast-dev#5257](feast-dev#5257)) ([32375a5](feast-dev@32375a5))
* Fix for proto issue in utils ([1b291b2](feast-dev@1b291b2))
* Fix milvus online_read ([feast-dev#5233](feast-dev#5233)) ([4b91f26](feast-dev@4b91f26))
* Fix tests ([431d9b8](feast-dev@431d9b8))
* Fixed Permissions object parameter in example ([feast-dev#5259](feast-dev#5259)) ([045c100](feast-dev@045c100))
* Java CI [feast-dev#12](feast-dev#12) ([d7e44ac](feast-dev@d7e44ac))
* Java PR [feast-dev#15](feast-dev#15) ([a5da3bb](feast-dev@a5da3bb))
* Java PR [feast-dev#16](feast-dev#16) ([e0320fe](feast-dev@e0320fe))
* Java PR [feast-dev#17](feast-dev#17) ([49da810](feast-dev@49da810))
* Materialization logs ([feast-dev#5243](feast-dev#5243)) ([4aa2f49](feast-dev@4aa2f49))
* Moving to custom github action for checking skip tests ([caf312e](feast-dev@caf312e))
* Operator - remove default replicas setting from Feast Deployment ([feast-dev#5294](feast-dev#5294)) ([e416d01](feast-dev@e416d01))
* Patch java pr [feast-dev#14](feast-dev#14) ([592526c](feast-dev@592526c))
* Patch update for test ([a3e8967](feast-dev@a3e8967))
* Remove conditional from steps ([995307f](feast-dev@995307f))
* Remove misleading HTTP prefix from gRPC endpoints in logs and doc ([feast-dev#5280](feast-dev#5280)) ([0ee3a1e](feast-dev@0ee3a1e))
* removing id ([268ade2](feast-dev@268ade2))
* Renaming workflow file ([5f46279](feast-dev@5f46279))
* Resolve `no pq wrapper` import issue ([feast-dev#5240](feast-dev#5240)) ([d5906f1](feast-dev@d5906f1))
* Update actions to remove check skip tests ([feast-dev#5275](feast-dev#5275)) ([b976f27](feast-dev@b976f27))
* Update docling demo ([446efea](feast-dev@446efea))
* Update java pr [feast-dev#13](feast-dev#13) ([fda7db7](feast-dev@fda7db7))
* Update java_pr ([fa138f4](feast-dev@fa138f4))
* Update repo_config.py ([6a59815](feast-dev@6a59815))
* Update unit tests workflow ([06486a0](feast-dev@06486a0))
* Updated docs for docling demo ([768e6cc](feast-dev@768e6cc))
* Updating action for unit tests ([0996c28](feast-dev@0996c28))
* Updating github actions to filter at job level ([0a09622](feast-dev@0a09622))
* Updating Java CI ([c7c3a3c](feast-dev@c7c3a3c))
* Updating java pr to skip tests ([e997dd9](feast-dev@e997dd9))
* Updating workflows ([c66bcd2](feast-dev@c66bcd2))

### Features

* Add date_partition_column_format for spark source ([feast-dev#5273](feast-dev#5273)) ([7a61d6f](feast-dev@7a61d6f))
* Add Milvus tutorial with Feast integration ([feast-dev#5292](feast-dev#5292)) ([a1388a5](feast-dev@a1388a5))
* Add pgvector tutorial with PostgreSQL integration ([feast-dev#5290](feast-dev#5290)) ([bb1cbea](feast-dev@bb1cbea))
* Add ReactFlow visualization for Feast registry metadata ([feast-dev#5297](feast-dev#5297)) ([9768970](feast-dev@9768970))
* Add retrieve online documents v2 method into  pgvector  ([feast-dev#5253](feast-dev#5253)) ([6770ee6](feast-dev@6770ee6))
* Compute Engine Initial Implementation ([feast-dev#5223](feast-dev#5223)) ([64bdafd](feast-dev@64bdafd))
* Enable write node for compute engine ([feast-dev#5287](feast-dev#5287)) ([f9baf97](feast-dev@f9baf97))
* Local compute engine ([feast-dev#5278](feast-dev#5278)) ([8e06dfe](feast-dev@8e06dfe))
* Make transform on writes configurable for ingestion ([feast-dev#5283](feast-dev#5283)) ([ecad170](feast-dev@ecad170))
* Offline store update pull_all_from_table_or_query to make timestampfield optional ([feast-dev#5281](feast-dev#5281)) ([4b94608](feast-dev@4b94608))
* Serialization version 2 deprecation notice ([feast-dev#5248](feast-dev#5248)) ([327d99d](feast-dev@327d99d))
* Vector length definition moved to Feature View from Config  ([feast-dev#5289](feast-dev#5289)) ([d8f1c97](feast-dev@d8f1c97))

Signed-off-by: Jacob Weinhold <29459386+j-wine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Elastic Search, QDrant, and PGVector to retrieve_online_documents_v2 method

3 participants