fix(batch): retry R2 upload on transient failure in BatchPayloadProcessor by matt-aitken · Pull Request #3331 · triggerdotdev/trigger.dev

matt-aitken
## Summary
12 new features, 59 improvements, 17 bug fixes.

## Highlights

- Add support for setting TTL (time-to-live) defaults at the task level
and globally in trigger.config.ts, with per-trigger overrides still
taking precedence
([#3196](#3196))
- Large run outputs can use the new API which allows switching object
storage providers.
([#3275](#3275))

## Improvements
- Add platform notifications support to the CLI. The `trigger dev` and
`trigger login` commands now fetch and display platform notifications
(info, warn, error, success) from the server. Includes discovery-based
filtering to conditionally show notifications based on project file
patterns, color markup rendering for styled terminal output, and a
non-blocking display flow with a spinner fallback for slow fetches. Use
`--skip-platform-notifications` flag with `trigger dev` to disable the
notification check.
([#3254](#3254))
- Add `get_span_details` MCP tool for inspecting individual spans within
a run trace.
([#3255](#3255))
- New `get_span_details` tool returns full span attributes, timing,
events, and AI enrichment (model, tokens, cost, speed)
- Span IDs now shown in `get_run_details` trace output for easy
discovery
- New API endpoint `GET /api/v1/runs/:runId/spans/:spanId`
- New `retrieveSpan()` method on the API client
- `get_query_schema` — discover available TRQL tables and columns
- `query` — execute TRQL queries against your data
- `list_dashboards` — list built-in dashboards and their widgets
- `run_dashboard_query` — execute a single dashboard widget query
- `whoami` — show current profile, user, and API URL
- `list_profiles` — list all configured CLI profiles
- `switch_profile` — switch active profile for the MCP session
- `start_dev_server` — start `trigger dev` in the background and stream
output
- `stop_dev_server` — stop the running dev server
- `dev_server_status` — check dev server status and view recent logs
- `GET /api/v1/query/schema` — query table schema discovery
- `GET /api/v1/query/dashboards` — list built-in dashboards
- `--readonly` flag hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so the AI cannot make changes
- `read:query` JWT scope for query endpoint authorization
- `get_run_details` trace output is now paginated with cursor support
- MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools
- `get_query_schema` now requires a table name and returns only one
table's schema (was returning all tables)
- `get_current_worker` no longer inlines payload schemas; use new
`get_task_schema` tool instead
- Query results formatted as text tables instead of JSON (~50% fewer
tokens)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw JSON
- Schema and dashboard API responses cached to avoid redundant fetches
- Adapted the CLI API client to propagate the trigger source via http
headers.
([#3241](#3241))
- Propagate run tags to span attributes so they can be extracted
server-side for LLM cost attribution metadata.
([#3213](#3213))
- New `get_span_details` tool returns full span attributes, timing,
events, and AI enrichment (model, tokens, cost, speed)
- Span IDs now shown in `get_run_details` trace output for easy
discovery
- New API endpoint `GET /api/v1/runs/:runId/spans/:spanId`
- New `retrieveSpan()` method on the API client
- `get_query_schema` — discover available TRQL tables and columns
- `query` — execute TRQL queries against your data
- `list_dashboards` — list built-in dashboards and their widgets
- `run_dashboard_query` — execute a single dashboard widget query
- `whoami` — show current profile, user, and API URL
- `list_profiles` — list all configured CLI profiles
- `switch_profile` — switch active profile for the MCP session
- `start_dev_server` — start `trigger dev` in the background and stream
output
- `stop_dev_server` — stop the running dev server
- `dev_server_status` — check dev server status and view recent logs
- `GET /api/v1/query/schema` — query table schema discovery
- `GET /api/v1/query/dashboards` — list built-in dashboards
- `--readonly` flag hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so the AI cannot make changes
- `read:query` JWT scope for query endpoint authorization
- `get_run_details` trace output is now paginated with cursor support
- MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools
- `get_query_schema` now requires a table name and returns only one
table's schema (was returning all tables)
- `get_current_worker` no longer inlines payload schemas; use new
`get_task_schema` tool instead
- Query results formatted as text tables instead of JSON (~50% fewer
tokens)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw JSON
- Schema and dashboard API responses cached to avoid redundant fetches
- Add optional `hasPrivateLink` field to the dequeue message
organization object for private networking support
([#3264](#3264))
- Define and manage AI prompts with `prompts.define()`. Create typesafe
prompt templates with variables, resolve them at runtime, and manage
versions and overrides from the dashboard without redeploying.
([#3244](#3244))

## Bug fixes
- Fix dev CLI leaking build directories on rebuild, causing disk space
accumulation. Deprecated workers are now pruned (capped at 2 retained)
when no active runs reference them. The watchdog process also cleans up
`.trigger/tmp/` when the dev CLI is killed ungracefully (e.g. SIGKILL
from pnpm).
([#3224](#3224))
- Fix `--load` flag being silently ignored on local/self-hosted builds.
([#3114](#3114))
- Fixed `search_docs` tool failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (#3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed dev CLI leaking build directories on rebuild — deprecated
workers now clean up their build dirs when their last run completes
- Fixed `search_docs` tool failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (#3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed dev CLI leaking build directories on rebuild — deprecated
workers now clean up their build dirs when their last run completes

## Server changes

These changes affect the self-hosted Docker image and Trigger.dev Cloud:

- Add admin UI for viewing and editing feature flags (org-level
overrides and global defaults).
([#3291](#3291))
- AI prompt management dashboard and enhanced span inspectors.
  
  **Prompt management:**
- Prompts list page with version status, model, override indicators, and
24h usage sparklines
- Prompt detail page with template viewer, variable preview, version
history timeline, and override editor
- Create, edit, and remove overrides to change prompt content or model
without redeploying
  - Promote any code-deployed version to current
- Generations tab with infinite scroll, live polling, and inline span
inspector
- Per-prompt metrics: total generations, avg tokens, avg cost, latency,
with version-level breakdowns
  
  **AI span inspectors:**
- Custom inspectors for `ai.generateText`, `ai.streamText`,
`ai.generateObject`, `ai.streamObject` parent spans
- `ai.toolCall` inspector showing tool name, call ID, and input
arguments
  - `ai.embed` inspector showing model, provider, and input text
- Prompt tab on AI spans linking to prompt version with template and
input variables
  - Compact timestamp and duration header on all AI span inspectors
  
  **AI metrics dashboard:**
- Operations, Providers, and Prompts filters on the AI Metrics dashboard
  - Cost by prompt widget
  - "AI" section in the sidebar with Prompts and AI Metrics links
  
  **Other improvements:**
  - Resizable panel sizes now persist across page refreshes
- Fixed `<div>` inside `<p>` DOM nesting warnings in span titles and
chat messages
([#3244](#3244))
- Add allowRollbacks query param to the promote deployment API to enable
version downgrades
([#3214](#3214))
- Pre-warm compute templates on deploy for orgs with compute access.
Required for projects using a compute region, background-only for
others.
([#3114](#3114))
- Add automatic LLM cost calculation for spans with GenAI semantic
conventions. When a span arrives with `gen_ai.response.model` and token
usage data, costs are calculated from an in-memory pricing registry
backed by Postgres and dual-written to both span attributes
(`trigger.llm.*`) and a new `llm_metrics_v1` ClickHouse table that
captures usage, cost, performance (TTFC, tokens/sec), and behavioral
(finish reason, operation type) metrics.
([#3213](#3213))
- Add API endpoint `GET /api/v1/runs/:runId/spans/:spanId` that returns
detailed span information including properties, events, AI enrichment
(model, tokens, cost), and triggered child runs.
([#3255](#3255))
- Multi-provider object storage with protocol-based routing for
zero-downtime migration
([#3275](#3275))
- Add IAM role-based auth support for object stores (no access keys
required).
([#3275](#3275))
- Add platform notifications to inform users about new features,
changelogs, and platform events directly in the dashboard.
([#3254](#3254))
- Add private networking support via AWS PrivateLink. Includes
BillingClient methods for managing private connections, org settings UI
pages for connection management, and supervisor changes to apply
`privatelink` pod labels for CiliumNetworkPolicy matching.
([#3264](#3264))
- Reduce run start latency by skipping the intermediate queue when
concurrency is available. This optimization is rolled out per-region and
enabled automatically for development environments.
([#3299](#3299))
- Extended the search filter on the environment variables page to match
on environment type (production, staging, development, preview) and
branch name, not just variable name and value.
([#3302](#3302))
- Set `application_name` on Prisma connections from SERVICE_NAME so DB
load can be attributed by service
([#3348](#3348))
- Fix transient R2/object store upload failures during batchTrigger()
item streaming.
  
- Added p-retry (3 attempts, 500ms–2s exponential backoff) around
`uploadPacketToObjectStore` in `BatchPayloadProcessor.process()` so
transient network errors self-heal server-side rather than aborting the
entire batch stream.
- Removed `x-should-retry: false` from the 500 response on the batch
items route so the SDK's existing 5xx retry path can recover if
server-side retries are exhausted. Item deduplication by index makes
full-stream retries safe.
([#3331](#3331))
- Concurrency-keyed queues now use a single master queue entry per base
queue instead of one entry per key. Prevents high-CK-count tenants from
consuming the entire parentQueueLimit window and starving other tenants
on the same shard.
([#3219](#3219))
- Reduce lock contention when processing large `batchTriggerAndWait`
batches. Previously, each batch item acquired a Redis lock on the parent
run to insert a `TaskRunWaitpoint` row, causing
`LockAcquisitionTimeoutError` with high concurrency (880 errors/24h in
prod). Since `blockRunWithCreatedBatch` already transitions the parent
to `EXECUTING_WITH_WAITPOINTS` before items are processed, the per-item
lock is unnecessary. The new `blockRunWithWaitpointLockless` method
performs only the idempotent CTE insert without acquiring the lock.
([#3232](#3232))
- Strip `secure` query parameter from QUERY_CLICKHOUSE_URL before
passing to ClickHouse client. This was already done for the main and
logs ClickHouse clients but was missing for the query client, causing a
startup crash with `Error: Unknown URL parameters: secure`.
([#3204](#3204))
- Fix `OrganizationsPresenter.#getEnvironment` matching the wrong
development environment on teams with multiple members. All dev
environments share the slug `"dev"`, so the previous `find` by slug
alone could return another member's environment. Now filters DEVELOPMENT
environments by `orgMember.userId` to ensure the logged-in user's dev
environment is selected.
([#3273](#3273))

<details>
<summary>Raw changeset output</summary>

# Releases
## @trigger.dev/build@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## trigger.dev@4.4.4

### Patch Changes

- Add platform notifications support to the CLI. The `trigger dev` and
`trigger login` commands now fetch and display platform notifications
(info, warn, error, success) from the server. Includes discovery-based
filtering to conditionally show notifications based on project file
patterns, color markup rendering for styled terminal output, and a
non-blocking display flow with a spinner fallback for slow fetches. Use
`--skip-platform-notifications` flag with `trigger dev` to disable the
notification check.
([#3254](#3254))

- Fix dev CLI leaking build directories on rebuild, causing disk space
accumulation. Deprecated workers are now pruned (capped at 2 retained)
when no active runs reference them. The watchdog process also cleans up
`.trigger/tmp/` when the dev CLI is killed ungracefully (e.g. SIGKILL
from pnpm).
([#3224](#3224))

- Fix `--load` flag being silently ignored on local/self-hosted builds.
([#3114](#3114))

- Add `get_span_details` MCP tool for inspecting individual spans within
a run trace.
([#3255](#3255))

- New `get_span_details` tool returns full span attributes, timing,
events, and AI enrichment (model, tokens, cost, speed)
- Span IDs now shown in `get_run_details` trace output for easy
discovery
    -   New API endpoint `GET /api/v1/runs/:runId/spans/:spanId`
    -   New `retrieveSpan()` method on the API client

- MCP server improvements: new tools, bug fixes, and new flags.
([#3224](#3224))

    **New tools:**

    -   `get_query_schema` — discover available TRQL tables and columns
    -   `query` — execute TRQL queries against your data
    -   `list_dashboards` — list built-in dashboards and their widgets
    -   `run_dashboard_query` — execute a single dashboard widget query
    -   `whoami` — show current profile, user, and API URL
    -   `list_profiles` — list all configured CLI profiles
    -   `switch_profile` — switch active profile for the MCP session
- `start_dev_server` — start `trigger dev` in the background and stream
output
    -   `stop_dev_server` — stop the running dev server
- `dev_server_status` — check dev server status and view recent logs

    **New API endpoints:**

    -   `GET /api/v1/query/schema` — query table schema discovery
    -   `GET /api/v1/query/dashboards` — list built-in dashboards

    **New features:**

- `--readonly` flag hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so the AI cannot make changes
    -   `read:query` JWT scope for query endpoint authorization
- `get_run_details` trace output is now paginated with cursor support
- MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools

    **Bug fixes:**

- Fixed `search_docs` tool failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (#3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed dev CLI leaking build directories on rebuild — deprecated
workers now clean up their build dirs when their last run completes

    **Context optimizations:**

- `get_query_schema` now requires a table name and returns only one
table's schema (was returning all tables)
- `get_current_worker` no longer inlines payload schemas; use new
`get_task_schema` tool instead
- Query results formatted as text tables instead of JSON (~50% fewer
tokens)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw JSON
- Schema and dashboard API responses cached to avoid redundant fetches

- Add support for setting TTL (time-to-live) defaults at the task level
and globally in trigger.config.ts, with per-trigger overrides still
taking precedence
([#3196](#3196))

- Adapted the CLI API client to propagate the trigger source via http
headers.
([#3241](#3241))

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`
    -   `@trigger.dev/build@4.4.4`
    -   `@trigger.dev/schema-to-json@4.4.4`

## @trigger.dev/core@4.4.4

### Patch Changes

- Fix `list_deploys` MCP tool failing when deployments have null
`runtime` or `runtimeVersion` fields.
([#3224](#3224))

- Propagate run tags to span attributes so they can be extracted
server-side for LLM cost attribution metadata.
([#3213](#3213))

- Add `get_span_details` MCP tool for inspecting individual spans within
a run trace.
([#3255](#3255))

- New `get_span_details` tool returns full span attributes, timing,
events, and AI enrichment (model, tokens, cost, speed)
- Span IDs now shown in `get_run_details` trace output for easy
discovery
    -   New API endpoint `GET /api/v1/runs/:runId/spans/:spanId`
    -   New `retrieveSpan()` method on the API client

- MCP server improvements: new tools, bug fixes, and new flags.
([#3224](#3224))

    **New tools:**

    -   `get_query_schema` — discover available TRQL tables and columns
    -   `query` — execute TRQL queries against your data
    -   `list_dashboards` — list built-in dashboards and their widgets
    -   `run_dashboard_query` — execute a single dashboard widget query
    -   `whoami` — show current profile, user, and API URL
    -   `list_profiles` — list all configured CLI profiles
    -   `switch_profile` — switch active profile for the MCP session
- `start_dev_server` — start `trigger dev` in the background and stream
output
    -   `stop_dev_server` — stop the running dev server
- `dev_server_status` — check dev server status and view recent logs

    **New API endpoints:**

    -   `GET /api/v1/query/schema` — query table schema discovery
    -   `GET /api/v1/query/dashboards` — list built-in dashboards

    **New features:**

- `--readonly` flag hides write tools (`deploy`, `trigger_task`,
`cancel_run`) so the AI cannot make changes
    -   `read:query` JWT scope for query endpoint authorization
- `get_run_details` trace output is now paginated with cursor support
- MCP tool annotations (`readOnlyHint`, `destructiveHint`) for all tools

    **Bug fixes:**

- Fixed `search_docs` tool failing due to renamed upstream Mintlify tool
(`SearchTriggerDev` → `search_trigger_dev`)
- Fixed `list_deploys` failing when deployments have null
`runtime`/`runtimeVersion` fields (#3139)
- Fixed `list_preview_branches` crashing due to incorrect response shape
access
- Fixed `metrics` table column documented as `value` instead of
`metric_value` in query docs
- Fixed dev CLI leaking build directories on rebuild — deprecated
workers now clean up their build dirs when their last run completes

    **Context optimizations:**

- `get_query_schema` now requires a table name and returns only one
table's schema (was returning all tables)
- `get_current_worker` no longer inlines payload schemas; use new
`get_task_schema` tool instead
- Query results formatted as text tables instead of JSON (~50% fewer
tokens)
- `cancel_run`, `list_deploys`, `list_preview_branches` formatted as
text instead of raw JSON
- Schema and dashboard API responses cached to avoid redundant fetches

- Large run outputs can use the new API which allows switching object
storage providers.
([#3275](#3275))

- Add optional `hasPrivateLink` field to the dequeue message
organization object for private networking support
([#3264](#3264))

- Add support for setting TTL (time-to-live) defaults at the task level
and globally in trigger.config.ts, with per-trigger overrides still
taking precedence
([#3196](#3196))

- Adapted the CLI API client to propagate the trigger source via http
headers.
([#3241](#3241))

## @trigger.dev/python@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/sdk@4.4.4`
    -   `@trigger.dev/core@4.4.4`
    -   `@trigger.dev/build@4.4.4`

## @trigger.dev/react-hooks@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## @trigger.dev/redis-worker@4.4.4

### Patch Changes

- Adapted the CLI API client to propagate the trigger source via http
headers.
([#3241](#3241))
-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## @trigger.dev/rsc@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## @trigger.dev/schema-to-json@4.4.4

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

## @trigger.dev/sdk@4.4.4

### Patch Changes

- Define and manage AI prompts with `prompts.define()`. Create typesafe
prompt templates with variables, resolve them at runtime, and manage
versions and overrides from the dashboard without redeploying.
([#3244](#3244))
- Add support for setting TTL (time-to-live) defaults at the task level
and globally in trigger.config.ts, with per-trigger overrides still
taking precedence
([#3196](#3196))
- Adapted the CLI API client to propagate the trigger source via http
headers.
([#3241](#3241))
-   Updated dependencies:
    -   `@trigger.dev/core@4.4.4`

</details>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>