Bug: TypeError / ValueError when materializing Array(String) feature views with Athena offline store
Description
Materializing feature views that contain Array(String) columns using the Athena offline store fails intermittently with one of two errors:
Error 1: ValueError: The truth value of an empty array is ambiguous
Triggered when an entity row has no values set for an array column (e.g. tags = []).
File ".../feast/type_map.py", line 772, in _python_value_to_proto_value
elif not pd.isnull(value):
ValueError: The truth value of an empty array is ambiguous. Use `array.size > 0` to check that an array is not empty.
Error 2: TypeError: bad argument type for built-in operation
Triggered when any user has values in an array column.
File ".../feast/type_map.py", line 905, in <listcomp>
ProtoValue(**{field_name: proto_type(val=value)})
TypeError: bad argument type for built-in operation
Root Cause
Arrow/Athena deserializes Array(String) feature columns as numpy.ndarray with object dtype rather than plain Python lists. Two code paths in type_map.py do not handle this:
-
Scalar null-check (
_convert_scalar_values_to_proto): The lineelif not pd.isnull(value)callspd.isnull()on a numpy array, which returns an array of bools โ thennot <array>raisesValueErrorbecause the truth value is ambiguous. -
Generic list conversion (
_convert_list_values_to_proto): The callproto_type(val=value)passes the rawnumpy.ndarraydirectly to the protobuf constructor. Protobuf rejects non-list types and raisesTypeError. Additionally, Arrow nullable columns can produceNoneelements inside the ndarray, which protobufStringListalso rejects. -
Type validation (
_validate_collection_item_types):Noneelements inside an ndarray fail thetype(item) in valid_typescheck before they can be sanitized downstream.
Steps to Reproduce
- Define a
FeatureViewwith anArray(String)field:Field(name="tags", dtype=Array(String))
- Materialize from Athena where some rows have non-empty arrays, some have empty arrays, and some have NULL values in array elements.
- Observe
ValueErrororTypeErrorinfeast/type_map.py.
Expected Behavior
Materialization completes successfully. Array columns from Arrow/Athena are converted to proto-safe Python lists, with None elements replaced by empty string.
Environment
- Feast version: (any version with the generic list proto conversion)
- Offline store: Athena
- Python: 3.11
- Feature column type:
Array(String)(maps toValueType.STRING_LIST)
Fix
I opened a PR with a fix for this issue: #6324