No support for null UnixTimestamp
Expected Behavior
This script should return online features, with one null in last_purchased_date
from feast import Entity, FeatureView from feast.infra.offline_stores.file_source import FileSource from feast.repo_config import RegistryConfig, RepoConfig from feast import FeatureStore from datetime import datetime from feast.types import Int32, UnixTimestamp from feast import Field import pandas as pd # create dataset pd.DataFrame([ {"user_id": 1, "event_timestamp": datetime(2022, 5, 1), "created": datetime(2022, 5, 1), "purchases": 3, "last_purchase_date": datetime(2022, 4, 23, 13, 4, 1)}, {"user_id": 2, "event_timestamp": datetime(2022, 5, 2), "created": datetime(2022, 5, 2), "purchases": 1, "last_purchase_date": datetime(2022, 2, 1, 11, 4, 1)}, {"user_id": 3, "event_timestamp": datetime(2022, 5, 2), "created": datetime(2022, 5, 2), "purchases": 0, "last_purchase_date": None}, ]).to_parquet('user_stats.parquet') user = Entity(name="user_id", description="user id") user_stats_view = FeatureView( name="user_stats", entities=[user], source=FileSource( path="user_stats.parquet", timestamp_field="event_timestamp", created_timestamp_column="created", ), schema=[ Field(name="purchases", dtype=Int32), Field(name="last_purchase_date", dtype=UnixTimestamp), ] ) online_store_path = 'online_store.db' registry_path = 'registry.db' repo = RepoConfig( registry="registry.db", project='feature_store', provider="local", offline_store="file", use_ssl=True, is_secure=True, validate=True, ) fs = FeatureStore(config=repo) fs.apply([user, user_stats_view]) fs.materialize_incremental(end_date=datetime.utcnow()) entity_rows = [{"user_id": i} for i in range(1, 4)] feature_df = fs.get_online_features( features=[ "user_stats:purchases", "user_stats:last_purchase_date", ], entity_rows=entity_rows ).to_df() print(feature_df)
Current Behavior
Materializing 1 feature views to 2022-06-16 20:30:41-06:00 into the sqlite online store.
Since the ttl is 0 for feature view user_stats, the start date will be set to 1 year before the current time.
user_stats from 2021-06-17 20:30:41-06:00 to 2022-06-16 20:30:41-06:00:
100%|████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 420.36it/s]
Traceback (most recent call last):
File "null_timestamp_example.py", line 59, in <module>
feature_df = fs.get_online_features(
File "/Users/apope/feast/sdk/python/feast/online_response.py", line 79, in to_df
return pd.DataFrame(self.to_dict(include_event_timestamps))
File "/Users/apope/feast/sdk/python/feast/online_response.py", line 59, in to_dict
response[feature_ref] = [
File "/Users/apope/feast/sdk/python/feast/online_response.py", line 60, in <listcomp>
feast_value_type_to_python_type(v) for v in feature_vector.values
File "/Users/apope/feast/sdk/python/feast/type_map.py", line 74, in feast_value_type_to_python_type
val = datetime.fromtimestamp(val, tz=timezone.utc)
OSError: [Errno 84] Value too large to be stored in data type
Steps to reproduce
Specifications
- Version: 0.21.2
- Platform: Mac
- Subsystem:
Possible Solution
This happens because, while materializing, in _python_datetime_to_int_timestamp() the NaT value gets converted to -9223372036854775808
In [108]: from typing import cast, Sequence import numpy as np cast(Sequence[np.int_], np.array(['nat'], dtype='datetime64[ns]').astype('datetime64[s]').astype(np.int_)) Out [108]: array([-9223372036854775808])
Which is out of range for datetime.fromtimestamp():
In [109]: from datetime import datetime, timezone datetime.fromtimestamp(-9223372036854775808, tz=timezone.utc) Truncated Traceback (Use C-c C-$ to view full TB): /tmp/ipykernel_58/1143168205.py in <module> 1 from datetime import datetime, timezone 2 ----> 3 val = datetime.fromtimestamp(-9223372036854775808, tz=timezone.utc) OSError: [Errno 75] Value too large for defined data type
A simple fix would be to leave the materialization logic as-is and, when deserializing in feast_value_type_to_python_type(), just catch this one value and return a null instead.