feat(providers): prompt caching for Anthropic + Azure-Anthropic by waleedlatif1 · Pull Request #5101 · simstudioai/sim

cursor

Mark the static request prefix (system prompt + tools) with an ephemeral
cache_control breakpoint so repeated calls — agent tool-loops and multi-turn —
reuse the cached prefix (~90% cheaper cached input + lower latency). Azure-
Anthropic inherits this via the shared core.

- New providers/prompt-cache.ts gate: only caches when the static prefix is
  large enough to be cacheable AND likely reused (tools present, or a large
  system prompt), so a one-shot tool-less call never pays the cache-write
  surcharge. Kill switch: PROMPT_CACHE_DISABLED=true.
- anthropic/core.ts: convert system string -> a cached text block (after the
  structured-output concat, which assumes a string) and tag the last tool. Uses
  2 of Anthropic's 4 breakpoints; the tool-loop reuses the tagged payload.
- Outputs are unchanged; cost accounting already reads cache_read/creation
  tokens (buildAnthropicSegmentTokens), so usage stays accurate.

Matches the AI SDK / LangChain / Spring AI convention (explicit breakpoints for
Claude; automatic for OpenAI/Gemini). Bedrock + OpenRouter to follow (they need
cache-token accounting alongside).

Bot reviewed Jun 16, 2026

…tubEnv

- anthropic/core.ts: gate on request.systemPrompt instead of payload.system, so
  the no-messages path (where the system text is relocated into a user message
  and payload.system is blanked) still caches the tools prefix. (Cursor review)
- prompt-cache.test.ts: manage the kill-switch env via vi.stubEnv/unstubAllEnvs
  instead of assigning undefined (which coerces to "undefined" and leaks across
  workers). Addresses the Greptile finding while satisfying biome's noDelete rule.

…elper

- Remove the PROMPT_CACHE_DISABLED kill switch — prompt caching is always on.
- Extract the Anthropic tagging into applyAnthropicPromptCache(payload, tools,
  systemPrompt) in anthropic/utils.ts: one place that gates and mutates the
  system block + last tool, replacing the two inline blocks in core.ts.
- Add direct unit tests for the helper (system→cached block, last-tool tagged,
  relocated/blanked-system still tags tools, below-threshold and tool-less cases
  untouched) so the actual payload mutation is verified, not just the gate.

No behavior change to outputs; verified on vitest 4.1.8 (CI's version).

…m and request prompt

Gate on max(final payload.system, request.systemPrompt) so caching fires both
when the no-messages path blanks payload.system (size via the request prompt)
and when prompt-based structured output appends a large schema to payload.system
(size via the final system string). Add a test for the schema-appended case.

Caught by Cursor Bugbot.

Drop the inline // comments in favor of TSDoc on the helper/gate. The gate-sizing
and call-ordering rationale now lives in applyAnthropicPromptCache's TSDoc; no
behavior change.

Drives the real executeAnthropicProviderRequest down the streaming path with only
the client injected via the createClient seam (real models/utils/attachments),
and asserts the request payload handed to messages.create carries a
cache_control-tagged system block for a large prompt and a plain string for a
small one. Closes the end-to-end wiring gap (AI-SDK-style request-body capture).