fix: Fix materialization when running on Spark cluster. by ckarwicki · Pull Request #3166 · feast-dev/feast
ckarwicki
changed the title
Fix materialization when running on Spark cluster.
fix: Fix materialization when running on Spark cluster.
When running materialization and have Spark offline store configured to use cluster (`spark.master` pointing to actual Spark master node) `self.to_spark_df().write.parquet(temp_dir, mode="overwrite")` will create parquet file in worker node but `return pq.read_table(temp_dir)` is executed on driver node and it can't read from worker. Proposed fix makes materialization work when run on Spark cluster. Signed-off-by: ckarwicki <104110169+ckarwicki-deloitte@users.noreply.github.com> Signed-off-by: ckarwicki <71740096+ckarwicki@users.noreply.github.com>
Signed-off-by: ckarwicki <jdeveloper98@gmail.com> Signed-off-by: ckarwicki <104110169+ckarwicki-deloitte@users.noreply.github.com> Signed-off-by: ckarwicki <71740096+ckarwicki@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters