OnDemandFeatureView.feature_transformation.infer_features does pass UDF outputs to python_type_to_feast_value_type
Expected Behavior
OnDemandFeatureView.feature_transformation.infer_features should be able to infer features from primitive python types for all supported feast data types, for all transformation backends.
Current Behavior
All on demand feature views are currently broken for list types, as there is no way to bypass schema inference.
Details
OnDemandFeatureView.feature_transformation.infer_features can only infer features in the type map inside python_type_to_feast_value_type, i.e.
type_map = { "int": ValueType.INT64, "str": ValueType.STRING, "string": ValueType.STRING, # pandas.StringDtype "float": ValueType.DOUBLE, "bytes": ValueType.BYTES, "float64": ValueType.DOUBLE, "float32": ValueType.FLOAT, "int64": ValueType.INT64, "uint64": ValueType.INT64, "int32": ValueType.INT32, "uint32": ValueType.INT32, "int16": ValueType.INT32, "uint16": ValueType.INT32, "uint8": ValueType.INT32, "int8": ValueType.INT32, "bool": ValueType.BOOL, "boolean": ValueType.BOOL, "timedelta": ValueType.UNIX_TIMESTAMP, "timestamp": ValueType.UNIX_TIMESTAMP, "datetime": ValueType.UNIX_TIMESTAMP, "datetime64[ns]": ValueType.UNIX_TIMESTAMP, "datetime64[ns, tz]": ValueType.UNIX_TIMESTAMP, "category": ValueType.STRING, }
This is because if the type e.g. ValueType.FLOAT_LIST doesn't have a mapping in the dictionary above, and value is None, then isinstance(value, dtype) checks will fall through to the ValueError in python_type_to_feast_value_type.
Steps to reproduce
Initialize a new repository:
Modify the sample on_demand_feature_view to return an array of floats instead of just floats, e.g.
diff --git a/true_garfish/feature_repo/example_repo.py b/true_garfish/feature_repo/example_repo.py index 1f5b946..59d4501 100644 --- a/true_garfish/feature_repo/example_repo.py +++ b/true_garfish/feature_repo/example_repo.py @@ -16,7 +16,7 @@ from feast import ( from feast.feature_logging import LoggingConfig from feast.infra.offline_stores.file_source import FileLoggingDestination from feast.on_demand_feature_view import on_demand_feature_view -from feast.types import Float32, Float64, Int64 +from feast.types import Float32, Float64, Int64, Array # Define an entity for the driver. You can think of an entity as a primary key used to # fetch features. @@ -72,15 +72,16 @@ input_request = RequestSource( @on_demand_feature_view( sources=[driver_stats_fv, input_request], schema=[ - Field(name="conv_rate_plus_val1", dtype=Float64), - Field(name="conv_rate_plus_val2", dtype=Float64), + Field(name="conv_rate_plus_vals", dtype=Array(Float64)), ], ) def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame: - df = pd.DataFrame() - df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"] - df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"] - return df + result = {"conv_rate_plus_vals": []} + for _, row in inputs.iterrows(): + result["conv_rate_plus_vals"].append( + [row["conv_rate"] + row["val_to_add"], row["conv_rate"] + row["val_to_add_2"]] + ) + return pd.DataFrame(data=result)
- Run
feast apply, and you should get the following error:
Traceback (most recent call last): File "~/.../.venv/bin/feast", line 8, in <module> sys.exit(cli()) ^^^^^ File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~/.../.venv/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~/.../.venv/lib/python3.12/site-packages/feast/cli.py", line 506, in apply_total_command apply_total(repo_config, repo, skip_source_validation) File "~/.../.venv/lib/python3.12/site-packages/feast/repo_operations.py", line 347, in apply_total apply_total_with_repo_instance( File "~/.../.venv/lib/python3.12/site-packages/feast/repo_operations.py", line 299, in apply_total_with_repo_instance registry_diff, infra_diff, new_infra = store.plan(repo) ^^^^^^^^^^^^^^^^ File "~/.../.venv/lib/python3.12/site-packages/feast/feature_store.py", line 745, in plan self._make_inferences( File "~/.../.venv/lib/python3.12/site-packages/feast/feature_store.py", line 640, in _make_inferences odfv.infer_features() File "~/.../.venv/lib/python3.12/site-packages/feast/on_demand_feature_view.py", line 521, in infer_features inferred_features = self.feature_transformation.infer_features( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~/....venv/lib/python3.12/site-packages/feast/transformation/pandas_transformation.py", line 47, in infer_features python_type_to_feast_value_type(f, type_name=str(dt)) File "~/.../.venv/lib/python3.12/site-packages/feast/type_map.py", line 215, in python_type_to_feast_value_type raise ValueError( ValueError: Value with native type object cannot be converted into Feast value type
Adding some debug statements inside python_type_to_feast_value_type, we get the following locals before the error was raised:
name='conv_rate_plus_vals'
value=None
recurse=True
type_name='object'
type(value)=<class 'NoneType'>
As mentioned before this is because all transformation backends don't pass values to the type mapper, e.g. the pandas backend in this case
Specifications
- Version: 0.39.0
- Platform: arm64
- Subsystem: MacOS
Possible Solution
- Pass the sample values generated for type inference through to the type mapper
- Update the type mapper to handle lists that are two levels deep. This is because primitive UDF outputs are wrapped in either a
np.arrayorlistof length 1, so therefore lists should be two levels deep with the inner list being the list of feature values.