{{ message }}
feat: [Backend] Data Quality Monitoring with native compute, multi-backend support, REST API, CLI#6202
Merged
ntkathole merged 10 commits intoJun 9, 2026
Merged
Conversation
4340dbb to
940a4af
Compare
March 31, 2026 10:54
d0b45bb to
c06853e
Compare
April 21, 2026 14:00
ntkathole
reviewed
May 6, 2026
ntkathole
reviewed
May 6, 2026
ntkathole
reviewed
May 6, 2026
ntkathole
reviewed
May 6, 2026
ntkathole
reviewed
May 6, 2026
ntkathole
reviewed
May 6, 2026
ntkathole
reviewed
May 6, 2026
50 hidden items
Load more…
ntkathole
reviewed
Jun 2, 2026
ntkathole
reviewed
Jun 2, 2026
ntkathole
reviewed
Jun 2, 2026
ntkathole
reviewed
Jun 2, 2026
jyejare
added a commit
to jyejare/feast
that referenced
this pull request
Jun 3, 2026
- Forward set_baseline parameter in DQMJobManager.execute_job for compute jobs so user intent to mark a computation as baseline is no longer silently dropped. - Add "1=1" fallback when ts_filter is empty (both start_date and end_date are None) in BigQuery, PostgreSQL, and Snowflake monitoring compute to prevent invalid SQL generation. Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
0fe6970 to
d114171
Compare
June 3, 2026 12:36
jyejare
added a commit
to jyejare/feast
that referenced
this pull request
Jun 3, 2026
- Forward set_baseline parameter in DQMJobManager.execute_job for compute jobs so user intent to mark a computation as baseline is no longer silently dropped. - Add "1=1" fallback when ts_filter is empty (both start_date and end_date are None) in BigQuery, PostgreSQL, and Snowflake monitoring compute to prevent invalid SQL generation. Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
462da5b to
1d61e7c
Compare
June 4, 2026 14:22
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
ntkathole
approved these changes
Jun 9, 2026
ntkathole
left a comment
Member
There was a problem hiding this comment.
Looks good
Sorry, something went wrong.
Hide details
View details
ntkathole
merged commit
5458c37
into
feast-dev:master
Jun 9, 2026
36 of 37 checks passed
franciscojavierarceo
pushed a commit
that referenced
this pull request
Jun 13, 2026
# [0.64.0](v0.63.0...v0.64.0) (2026-06-13) ### Bug Fixes * Add async_supported property to RedisOnlineStore ([9b088fe](9b088fe)) * Add missing feast init templates to operator CRD and enhance persistence documentation ([1941d4d](1941d4d)) * Allow to publish from reference branch ([5458ec8](5458ec8)) * API calls list ([4203eb7](4203eb7)) * **bigquery:** Enable list inference for parquet loads in offline_write_batch ([9243497](9243497)), closes [#5845](#5845) * Bump grpcio dependencies ([07b4782](07b4782)) * **compute-engine/local:** Honor field_mapping on join keys in dedup + join nodes ([#6395](#6395)) ([bd01824](bd01824)) * **dynamodb:** Avoid tag race condition by using diff-based tag updates ([#6479](#6479)) ([bad2b7d](bad2b7d)), closes [#6418](#6418) * **dynamodb:** Fix mypy type for _build_projection_expression return ([217b4da](217b4da)) * Fix intermittent async test failures for DynamoDB and Redis ([63c5eb1](63c5eb1)) * Fix mongodb blog title ([57d28d4](57d28d4)) * Fix shared SQL registry crash - avoid unnecessary UDF deserialization in proto cache building ([ac588d7](ac588d7)) * Fix SparkRetrievalJob.persist() failing for SparkSource ([209d7cd](209d7cd)) * Fixed formatting and image for mongo blog ([#6377](#6377)) ([f8389fb](f8389fb)) * Fixes for ray source ([7f592a4](7f592a4)) * **go:** skip registry refresh when cache_ttl_seconds <= 0 ([97ed40c](97ed40c)) * Handle array of strings columns in Athena materialization ([#6324](#6324)) ([4ed0278](4ed0278)) * make milvus VARCHAR max_length configurable, remove hardcoded 512 limit ([3b98c22](3b98c22)) * **operator:** Set appProtocol: grpc on registry gRPC Service ([#6367](#6367)) ([c9ae2b4](c9ae2b4)) * PyJWT 2.10+ added validation that rejects empty HMAC keys ([e756ffe](e756ffe)) * RemoteOnlineStore sends all features in a single HTTP request ([8f187dd](8f187dd)) * Remove registry proto dump to enforce RBAC and add permission checks to Commit/Refresh RPCs ([328431f](328431f)) * Remove selector migration job - no longer needed ([51c325e](51c325e)) * replace broken .claude skill symlink with correct relative path ([4541690](4541690)) * Replace selector label strip patch with migration Job for upgrade-safe selector uniqueness ([00dea50](00dea50)) * Scope feature view name conflict check to current project in file-based registry ([#6369](#6369)) ([a4fde83](a4fde83)), closes [#6209](#6209) * **snowflake:** Stop double-quoting connection identifiers ([#6462](#6462)) ([e914d59](e914d59)) * **spark:** S3/GCS PyArrow filesystem resolution for staging paths ([#6442](#6442)) ([ae50414](ae50414)) * **trino:** Clean up temporary entity tables after retrieval ([#6381](#6381)) ([d86b13d](d86b13d)), closes [#6306](#6306) * Update go-feature-server base image to Go 1.25 and fix operator Dockerfile COPY permissions ([86ef0bc](86ef0bc)) ### Features * [Backend] Data Quality Monitoring with native compute, multi-backend support, REST API, CLI ([#6202](#6202)) ([5458c37](5458c37)) * Add apache flink compute engine ([#6476](#6476)) ([9636d6a](9636d6a)) * Add demo noteboooks for users ([e362173](e362173)) * Add enabled/disabled toggle for feature views ([#6401](#6401)) ([5f1fa0d](5f1fa0d)), closes [#6395](#6395) * Add Label View to init template ([ec272d5](ec272d5)) * Add mTLS support to remote registry gRPC client ([#6474](#6474)) ([c9602d8](c9602d8)) * Add Prometheus gauges for FeatureStore installation telemetry ([#6354](#6354)) ([1b681b7](1b681b7)) * Adds registry REST API endpoints for managing entities, data sources, and feature views ([#6413](#6413)) ([f77bd1d](f77bd1d)) * Allow CRUD on entities, data sources, and feature views from UI ([#6412](#6412)) ([2321c07](2321c07)) * Allow default openlineage configuration ([#6467](#6467)) ([276b6df](276b6df)) * **bigquery:** Support DATE-type event timestamp columns ([#6362](#6362)) ([753dee5](753dee5)), closes [#2530](#2530) * **cli:** Add `feast projects delete` command (closes [#5095](#5095)) ([#6318](#6318)) ([1a4b96c](1a4b96c)) * Data Quality Monitoring added in feast UI ([#6422](#6422)) ([fa271be](fa271be)) * **dynamodb:** Use ProjectionExpression when requested_features is set ([0adc906](0adc906)), closes [#6058](#6058) * Enhance DataSource and FeatureView modals with error handling and submission states ([96d7169](96d7169)) * Expose registry endpoints on feature server for MCP access ([f77981c](f77981c)) * Feast First-Class LabelView Implementation ([#6292](#6292)) ([c0e7e5d](c0e7e5d)) * Feast-MLflow Integration ([#6235](#6235)) ([7279c75](7279c75)) * Operational metrics for offline store and SOX metrics for both ([#6340](#6340)) ([65b1b80](65b1b80)) * Pre-compute feature service ([8011550](8011550)) * REST API-backed UI for RBAC compatibility and per-page lazy loading ([#6414](#6414)) ([6ae80af](6ae80af)) * Support non-string map key types ([#6382](#6382)) ([#6383](#6383)) ([728aa2e](728aa2e)) * Update FeatureStore CRD with DRA Fields ([01241e4](01241e4)) ### Performance Improvements * Cache feature view resolution in get_online_features to reduce per-request overhead ([55c2f18](55c2f18)) * Optimize feature serving latency with batched async Redis, cached checks fix ([103809a](103809a)) * Replace MessageToDict with optimized custom dict builder ([#6015](#6015)) ([9902064](9902064))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.
To check real UI monitoring:
Visit PR #6422, see Demo.
What this PR does / why we need it:
This PR introduces comprehensive feature quality monitoring capabilities to Feast, enabling proactive tracking of feature distributions and data quality metrics. Currently, Feast has no built-in tools for monitoring feature health in production — ML teams must build custom solutions to detect issues like distribution shifts, elevated null rates, or degraded data quality before they silently impact model performance.
What it adds:
Core Monitoring Engine
OfflineStoreas the primary compute path, with a Python-based (PyArrow/NumPy) fallback for backends that don't implement native compute. This leverages the offline store as a compute engine (same architecture as Feast materialization).OfflineStorebackend itself (no separate monitoring database). Six static methods on theOfflineStorebase class (compute_monitoring_metrics,get_monitoring_max_timestamp,ensure_monitoring_tables,save_monitoring_metrics,query_monitoring_metrics,clear_monitoring_baseline) handle compute and storage.MetricsCalculator) — Backend-agnostic statistical computation as fallback, supporting:PrimitiveFeastTypeandValueTypeMulti-Backend Support (8 Offline Stores)
All 6 native monitoring methods implemented for each backend with dialect-specific SQL:
INSERT ON CONFLICTPERCENTILE_CONT,WIDTH_BUCKETMERGEwithVARIANTJSONAPPROX_PERCENTILE,WIDTH_BUCKETMERGEinto BQ tablesAPPROX_QUANTILES, parameterized queriesMERGEvia Data APIAPPROXIMATE PERCENTILE_DISCPERCENTILE_APPROX,spark.sql()MERGE FROM DUALPERCENTILE_CONT WITHIN GROUPQUANTILE_CONT,HISTOGRAMpyarrow.compute+numpyMulti-Granularity Time-Series Metrics
daily,weekly,biweekly,monthly,quarterlyBatch + Log Data Source Support
batch_sourceviaOfflineStore.pull_all_from_table_or_query()FeatureService.logging_configdestination, using__log_timestampas event timestampdriver_stats__conv_rate) are parsed back to their originalfeature_view_name+feature_namefor storage compatibility and drift detectiondata_source_typecolumn (batch/log) differentiates metrics in storageOrchestration Service (
MonitoringService)OfflineStoreinstance for performanceNaN/Inf Sanitization
NaN/Inffloat values that break JSON serialization:opt_float()inmonitoring_utils.py— sanitizes at SQL result parsing_sanitize_floats()inmonitoring_service.py— final safety net on all API read pathsOut of range float values are not JSON compliant: nanShared Utilities (
monitoring_utils.py)monitoring_table_meta(),opt_float(),empty_numeric_metric(),empty_categorical_metric(),normalize_monitoring_row(),build_view_aggregate()DQM Job Engine (
DQMJobManager)compute,baseline,auto_compute)feast_monitoring_jobstableset_baselineto the compute engineREST API (
/monitoring/)POST/monitoring/computePOST/monitoring/auto_computePOST/monitoring/compute/transientPOST/monitoring/compute/logPOST/monitoring/auto_compute/logGET/monitoring/jobs/{job_id}GET/monitoring/metrics/featuresGET/monitoring/metrics/feature_viewsGET/monitoring/metrics/feature_servicesGET/monitoring/metrics/baselineGET/monitoring/metrics/timeseriesAll endpoints support cascading filters:
project,feature_service_name,feature_view_name,feature_name,granularity,data_source_type, date range.RBAC enforced using existing
AuthzedAction.DESCRIBE(read) andAuthzedAction.UPDATE(compute).CLI (
feast monitor run)Auto-Baseline on
feast applyfeast applyfeature_store.yaml:Feast Operator Support
DataQualityMonitoringConfigadded toFeatureStoreSpecdata_quality_monitoringsection infeature_store.yamlwhen config is setmake generateDocumentation
docs/how-to-guides/feature-monitoring.md— Production setup, CLI usage, REST API reference, orchestrator integration (Airflow, KFP, cron, K8s CronJob), backend compatibility tableexamples/monitoring/monitoring-quickstart.ipynb— 12-step hands-on walkthrough with visualization examplesdocs/SUMMARY.mdupdated with links to bothDesign decisions:
OfflineStorecompute + storage — Each backend implements its own SQL push-down for metrics calculation and uses its native UPSERT/MERGE for storage. No separate monitoring database needed./monitoring/route rather than extending existing/metrics/— The existing metrics route serves registry inventory metadata; monitoring serves statistical feature quality data with a different data path.data_quality_monitoringconfig — Sits alongsidematerializationandopenlineageinRepoConfig, reflecting that it spans offline store compute/storage + apply trigger + server API.Which issue(s) this PR fixes:
Partially Fixes #5919
Checks
git commit -s)Testing Strategy
Test coverage (all passing):
test_metrics_calculator.pytest_compute_correctness.pytest_monitoring_integration.pyrepo_config_test.goSnyk SAST scan: 0 vulnerabilities across all new files.