fix: Pre-create S3A event log dir before SparkContext init#6317
Conversation
c8351c5 to
448212d
Compare
April 22, 2026 15:40
R-behera
left a comment
There was a problem hiding this comment.
This looks like a useful guard for the S3A event log edge case, and the focused tests help. One follow-up worth considering is whether some Feast users rely on credentials or endpoint details only through Spark/Hadoop config rather than environment variables. If so, a short note or test around that path could prevent surprises when the pre-create step runs before Spark fully applies the config.
Sorry, something went wrong.
|
@abhijeet-dhumal Let's handle both comment from devin and @R-behera suggestion |
Sorry, something went wrong.
b60d47c to
19bdd11
Compare
April 24, 2026 08:15
@ntkathole Addressed both your comments ✅ |
Sorry, something went wrong.
@R-behera Good catch on the Spark/Hadoop config credentials path ✅ |
Sorry, something went wrong.
…prevent silent materialize failure Spark's EventLogFileWriter.requireLogBaseDirAsDirectory() is called inside SparkContext.__init__. When spark.eventLog.dir points to an S3A path that doesn't exist yet (S3 has no real directories), SparkContext fails to initialise — silently from Feast's perspective because _materialize_one() catches the exception and returns an ERROR job. Add _ensure_s3a_event_log_dir() to utils.py: before building the SparkSession, check if the S3A prefix exists and write a zero-byte placeholder if it doesn't. Uses boto3 (already a Feast dep via S3 offline store). Non-fatal: logs a warning and lets Spark surface its own error if the write fails. Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
… config, add session token support Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
…linting Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
22b7e8e to
70215e2
Compare
April 27, 2026 11:54
9feca77
into
feast-dev:master
Apr 27, 2026
What this PR does / why we need it:
When spark.eventLog.enabled: "true" and spark.eventLog.dir points to an S3A path, feast materialize-incremental silently writes nothing to the online store and exits with code 0.
The failure chain:
S3 has no real directories. An empty prefix is indistinguishable from "does not exist", so Spark's pre-flight check always fails on a fresh bucket.
Which issue(s) this PR fixes:
In get_or_create_new_spark_session() (compute_engines/spark/utils.py), before building the SparkSession, call _ensure_s3a_event_log_dir() which:
No-ops for non-S3A paths (hdfs://, file://, etc.) and when event logging is disabled.
Checks
git commit -s)Testing Strategy
Misc