feat(providers): prompt caching for Anthropic + Azure-Anthropic by waleedlatif1 · Pull Request #5101 · simstudioai/sim
Mark the static request prefix (system prompt + tools) with an ephemeral cache_control breakpoint so repeated calls — agent tool-loops and multi-turn — reuse the cached prefix (~90% cheaper cached input + lower latency). Azure- Anthropic inherits this via the shared core. - New providers/prompt-cache.ts gate: only caches when the static prefix is large enough to be cacheable AND likely reused (tools present, or a large system prompt), so a one-shot tool-less call never pays the cache-write surcharge. Kill switch: PROMPT_CACHE_DISABLED=true. - anthropic/core.ts: convert system string -> a cached text block (after the structured-output concat, which assumes a string) and tag the last tool. Uses 2 of Anthropic's 4 breakpoints; the tool-loop reuses the tagged payload. - Outputs are unchanged; cost accounting already reads cache_read/creation tokens (buildAnthropicSegmentTokens), so usage stays accurate. Matches the AI SDK / LangChain / Spring AI convention (explicit breakpoints for Claude; automatic for OpenAI/Gemini). Bedrock + OpenRouter to follow (they need cache-token accounting alongside).
Bot
reviewed
…tubEnv - anthropic/core.ts: gate on request.systemPrompt instead of payload.system, so the no-messages path (where the system text is relocated into a user message and payload.system is blanked) still caches the tools prefix. (Cursor review) - prompt-cache.test.ts: manage the kill-switch env via vi.stubEnv/unstubAllEnvs instead of assigning undefined (which coerces to "undefined" and leaks across workers). Addresses the Greptile finding while satisfying biome's noDelete rule.
…elper - Remove the PROMPT_CACHE_DISABLED kill switch — prompt caching is always on. - Extract the Anthropic tagging into applyAnthropicPromptCache(payload, tools, systemPrompt) in anthropic/utils.ts: one place that gates and mutates the system block + last tool, replacing the two inline blocks in core.ts. - Add direct unit tests for the helper (system→cached block, last-tool tagged, relocated/blanked-system still tags tools, below-threshold and tool-less cases untouched) so the actual payload mutation is verified, not just the gate. No behavior change to outputs; verified on vitest 4.1.8 (CI's version).
…m and request prompt Gate on max(final payload.system, request.systemPrompt) so caching fires both when the no-messages path blanks payload.system (size via the request prompt) and when prompt-based structured output appends a large schema to payload.system (size via the final system string). Add a test for the schema-appended case. Caught by Cursor Bugbot.
Drop the inline // comments in favor of TSDoc on the helper/gate. The gate-sizing and call-ordering rationale now lives in applyAnthropicPromptCache's TSDoc; no behavior change.
Drives the real executeAnthropicProviderRequest down the streaming path with only the client injected via the createClient seam (real models/utils/attachments), and asserts the request payload handed to messages.create carries a cache_control-tagged system block for a large prompt and a plain string for a small one. Closes the end-to-end wiring gap (AI-SDK-style request-body capture).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters