◐ Shell
clean mode source ↗

fix: Handle numpy ndarray in Array(String) materialization by SIDDHESH1564 · Pull Request #6327 · feast-dev/feast

What this PR does / why we need it:

Materializing feature views that contain Array(String) columns using the Athena offline store fails with TypeError or ValueError.

Root cause:
Arrow/Athena deserializes array columns as numpy.ndarray (object dtype) instead of Python lists, which breaks assumptions in feast/type_map.py.

Issues observed:

  • _validate_collection_item_types: None elements inside ndarray fail strict type validation.
  • _convert_list_values_to_proto: Passing numpy.ndarray directly to protobuf constructors (e.g. StringList) raises TypeError. None elements are also invalid for protobuf repeated fields.
  • _convert_scalar_values_to_proto: pd.isnull(ndarray) returns an array of booleans; applying not raises ValueError ("truth value of an empty array is ambiguous").

Fix implemented:

  • Convert numpy.ndarray → Python list using .tolist() before proto conversion
  • Replace None elements with type-appropriate defaults:
    • "" (string), 0 (int), 0.0 (float), False (bool)
  • Allow None during intermediate validation in _validate_collection_item_types

Which issue(s) this PR fixes:

Fixes #6325


Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Details:
Added TestNdarrayListConversion in test_type_map.py with regression coverage for:

  • ndarray string/int/double/bool list conversions
  • Replacement of None elements with defaults
  • Empty ndarray producing null ProtoValue
  • Mixed batch scenarios (populated array + None + empty array)

Misc

Adds regression coverage to ensure stable handling of ndarray-backed array feature columns from Athena/Arrow going forward.


Open in Devin Review