◐ Shell
clean mode source ↗

fix: Fix materialization when running on Spark cluster. by ckarwicki · Pull Request #3166 · feast-dev/feast

@ckarwicki ckarwicki changed the title Fix materialization when running on Spark cluster. fix: Fix materialization when running on Spark cluster.

Sep 1, 2022
When running materialization and have Spark offline store configured to use cluster (`spark.master` pointing to actual Spark master node) `self.to_spark_df().write.parquet(temp_dir, mode="overwrite")` will create parquet file in worker node but `return pq.read_table(temp_dir)` is executed on driver node and it can't read from worker. Proposed fix makes materialization work when run on Spark cluster.

Signed-off-by: ckarwicki <104110169+ckarwicki-deloitte@users.noreply.github.com>
Signed-off-by: ckarwicki <71740096+ckarwicki@users.noreply.github.com>
Signed-off-by: ckarwicki <jdeveloper98@gmail.com>
Signed-off-by: ckarwicki <104110169+ckarwicki-deloitte@users.noreply.github.com>
Signed-off-by: ckarwicki <71740096+ckarwicki@users.noreply.github.com>
Signed-off-by: ckarwicki <jdeveloper98@gmail.com>
Signed-off-by: ckarwicki <104110169+ckarwicki-deloitte@users.noreply.github.com>
Signed-off-by: ckarwicki <71740096+ckarwicki@users.noreply.github.com>

kevjumba