fix: Handle numpy ndarray in Array(String) materialization by SIDDHESH1564 · Pull Request #6327 · feast-dev/feast
What this PR does / why we need it:
Materializing feature views that contain Array(String) columns using the Athena offline store fails with TypeError or ValueError.
Root cause:
Arrow/Athena deserializes array columns as numpy.ndarray (object dtype) instead of Python lists, which breaks assumptions in feast/type_map.py.
Issues observed:
_validate_collection_item_types:Noneelements insidendarrayfail strict type validation._convert_list_values_to_proto: Passingnumpy.ndarraydirectly to protobuf constructors (e.g.StringList) raisesTypeError.Noneelements are also invalid for protobuf repeated fields._convert_scalar_values_to_proto:pd.isnull(ndarray)returns an array of booleans; applyingnotraisesValueError("truth value of an empty array is ambiguous").
Fix implemented:
- Convert
numpy.ndarray→ Pythonlistusing.tolist()before proto conversion - Replace
Noneelements with type-appropriate defaults:""(string),0(int),0.0(float),False(bool)
- Allow
Noneduring intermediate validation in_validate_collection_item_types
Which issue(s) this PR fixes:
Fixes #6325
Checks
- I've made sure the tests are passing.
- My commits are signed off (
git commit -s) - My PR title follows conventional commits format
Testing Strategy
- Unit tests
- Integration tests
- Manual tests
- Testing is not required for this change
Details:
Added TestNdarrayListConversion in test_type_map.py with regression coverage for:
- ndarray string/int/double/bool list conversions
- Replacement of
Noneelements with defaults - Empty ndarray producing null
ProtoValue - Mixed batch scenarios (populated array +
None+ empty array)
Misc
Adds regression coverage to ensure stable handling of ndarray-backed array feature columns from Athena/Arrow going forward.