feat: Add delta format to `FileSource`, add support for it in ibis/duckdb by tokoko · Pull Request #4123 · feast-dev/feast
What this PR does / why we need it:
This PR adds delta format to FileSource (only parquet was supported before), also implements logic for handling delta sources in ibis offline store. It also adds integration tests for duckdb offline store with delta sources, dukdb tests are effectively run twice, once for parquet and then for delta.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
I was wondering how we can use delta lake. One question, how is it possible to use the delta format with the spark offline store or spark materialization?
SparkSource already supports it afaik, SparkSource can also be used to basically just describe the source as a table name without actually caring about the format (I think). while that is probably fine for spark offline store, one downside is that it's tied to just a single implementation.
What I'm hoping to achieve with FileSource is that we can have a single source type that can be read by multiple offline stores (dask, duckdb, spark and others) so you can define a set of sources in your feature repository that won't lock you to use a specific offline store only.
SparkSource already supports it afaik, SparkSource can also be used to basically just describe the source as a table name without actually caring about the format (I think). while that is probably fine for spark offline store, one downside is that it's tied to just a single implementation.
What I'm hoping to achieve with FileSource is that we can have a single source type that can be read by multiple offline stores (dask, duckdb, spark and others) so you can define a set of sources in your feature repository that won't lock you to use a specific offline store only.
Make sense. I guess for the pyspark, in order to use delta format, it needs to install a delta spark plugin?
tokoko
deleted the
delta-file-source
branch
Make sense. I guess for the pyspark, in order to use delta format, it needs to install a delta spark plugin?
Yeah, delta data source needs to be configured beforehand in some way.
@tokoko nice PR!
I kind of had put this on a back burner : P and focused my energy more on delta-rs.