◐ Shell
reader mode source ↗
Skip to content

fix: Harden informer cache with label selectors and memory optimizations#6242

Merged
ntkathole merged 2 commits into
feast-dev:masterfrom
jyejare:ugo-harden-cache
Apr 13, 2026
Merged

fix: Harden informer cache with label selectors and memory optimizations#6242
ntkathole merged 2 commits into
feast-dev:masterfrom
jyejare:ugo-harden-cache

Conversation

@jyejare

@jyejare jyejare commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

The feast-operator's Owns() calls create cluster-wide informers for ConfigMaps, Deployments, Services, and other resource types. On clusters with a large number of these objects, the informer cache can grow beyond the operator's 256Mi memory limit, causing OOMKill and restarts.

Changes

ByObject label selectors for all owned resource types

Restrict informer caches to only objects with app.kubernetes.io/managed-by: feast-operator. Covers all 10 owned types: ConfigMap, Deployment, Service, ServiceAccount, PVC, RoleBinding, Role, CronJob, HPA, PDB. Extracted into newCacheOptions() for clarity.

DefaultTransform: cache.TransformStripManagedFields()

Strip managedFields from all cached objects, reducing per-object memory footprint by ~30-50%.

GOMEMLIMIT=230MiB

Set Go runtime soft memory limit (90% of 256Mi container limit). Triggers GC pressure before hard OOMKill as defense-in-depth.

Additional changes

  • Add app.kubernetes.io/managed-by: feast-operator label to getLabels() so all FeatureStore-managed resources carry it
  • Introduce getSelectorLabels() for immutable selectors (Deployment spec.selector, Service spec.selector, TopologySpreadConstraints, PodAffinity) to avoid breaking existing resources on upgrade
  • Standardize notebook controller's managed-by label to app.kubernetes.io/managed-by
  • Use shared constants (services.ManagedByLabelKey/Value) throughout

Test Results

Verified on cluster with a large number of ConfigMaps pre-loaded:

Metric Before After
Memory usage 254Mi (at limit) 25Mi
Stability OOMKilled, CrashLoopBackOff Stable, no restarts

Test plan

  • Deploy fixed operator on RHOAI 3.3.0 cluster
  • Verify memory usage stays well below 256Mi limit under load
  • Verify no OOMKill or CrashLoopBackOff
  • Run existing unit tests (make test) — all pass
  • Verify getSelectorLabels() prevents immutable selector breakage on upgrade

Summary by CodeRabbit

  • Chores
    • Operator cache now only watches/caches resources marked with the managed-by label, reducing resource usage.
    • Added GOMEMLIMIT=230MiB to the controller-manager container.
    • Standardized managed-by label constants and separated immutable selector labels from mutable metadata to improve stability.
  • Tests
    • Updated tests to use the shared label constants.

Open with Devin

@jyejare jyejare requested a review from a team as a code owner April 8, 2026 17:15
devin-ai-integration[bot]

This comment was marked as resolved.

@jyejare jyejare force-pushed the ugo-harden-cache branch 3 times, most recently from eab7bf4 to aa69c5b Compare April 8, 2026 18:34
devin-ai-integration[bot]

This comment was marked as resolved.

@jyejare jyejare force-pushed the ugo-harden-cache branch from aa69c5b to 6a2995e Compare April 9, 2026 08:57
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@jyejare jyejare force-pushed the ugo-harden-cache branch 5 times, most recently from b3237d2 to e1b57ef Compare April 10, 2026 11:10

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Devin Review found 1 new potential issue.

View 15 additional findings in Devin Review.

Open in Devin Review

jyejare added 2 commits April 13, 2026 17:06
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>

@ntkathole ntkathole left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

lgtm

Hide details View details @ntkathole ntkathole merged commit 3f11356 into feast-dev:master Apr 13, 2026
32 of 35 checks passed
korbonits pushed a commit to korbonits/feast that referenced this pull request Apr 13, 2026
…ons (feast-dev#6242)

* fix: Harden informer cache with label selectors and memory optimizations

Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>

* Additional Fixes on caching with PVC and HPA

Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>

---------

Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Alex Korbonits <alex@korbonits.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants