Support Data Labeling and LabelViews
Is your feature request related to a problem? Please describe.
Historically, Feast has played a key role in feature development. Particularly around dataset preparation for model development and feature serving for online inference.
Pictorially, you can think of it like this:
Yet labels are the core piece of a training dataset that makes model training successful. Without labels, features are a waste of time (excluding semi/self-supervised learning).
Given the work with compute engine, my proposal is to expand Feast to include the entire Training Dataset preparation life cycle which would include labels and their correction.
A proof of concept was developed in the UI to highlight educate users about this here: #5410
We should expand this properly so that users can define a LabelView in the online store that can be used to store labels explicitly.
Describe the solution you'd like
A LabelView that can be used to write data to the online and offline store.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered. It could look something like:
customer = Entity(name="customer_id", dtype=Int64) # 2) Point to your label data in e.g. Parquet label_source = FileSource( path="gs://my-bucket/churn_labels/*.parquet", event_timestamp_column="label_timestamp", created_timestamp_column="created_ts", ) # 3) Declare the LabelView customer_churn = LabelView( name="customer_churn", entities=[customer], schema=[ Field(name="churned", dtype=ValueType.BOOL), Field(name="risk_score", dtype=ValueType.FLOAT), ], batch_source=label_source, ttl=timedelta(days=90), description="Customer churn flag and risk score for training/monitoring.", )
Additional context
Add any other context or screenshots about the feature request here.
