ci: Add weekly flaky test detector workflow by sl0thentr0py · Pull Request #6484 · getsentry/sentry-python

sentry-warden

Summary

Adds a weekly scheduled workflow (.github/workflows/flaky-test-detector.yml) that runs every Wednesday at 08:00 UTC (also manually via workflow_dispatch). It:

(shell step) Inspects the last ~30 test.yml / ci.yml runs on master via gh run list / gh run view --log-failed and writes the failed logs to ./ci-logs/.
(Claude step) Classifies genuinely flaky tests (intermittent failures, timing/ordering/network/global-state signals) vs. real regressions and infra noise — capped at the 5 clearest, ranked by impact — and writes a summary to flaky-issue-body.md.
(shell step) Opens one summary issue from that file listing each flaky test with its node ID, evidence (run IDs / failure frequency), a one-line root cause, and a short suggested fix.

It does not edit code or open a PR; the output is purely an informational issue for humans to triage.

Security

CI failure logs contain tracebacks and stdout controlled by whoever landed the commit, so they're untrusted input. The "treat logs as data" prompt is treated as defeatable; the real protections are mechanical and keep the log-reading agent away from any credentialed write channel:

The three steps are kept separate on purpose. Log fetching (step 1) and issue creation (step 3) are plain non-LLM shell steps; only they touch a token.
The Claude step has no shell and no write token. Its --allowedTools is Read,Glob,Grep,Write,TodoWrite — no Bash. With no subprocess and no network tool it cannot run gh/curl/printenv, so a prompt injection in the logs cannot exfiltrate ANTHROPIC_API_KEY or GITHUB_TOKEN, nor create an issue directly. It can only read the pre-fetched logs and write the issue body to a file.
issues: write lives only in the final shell step, which never ingests untrusted log text. All other permissions are read-only.

This avoids the single-step gh pitfall: giving the untrusted-data agent a gh write channel is itself an exfiltration vector (gh issue create --body "$ANTHROPIC_API_KEY"), and the subprocess env-scrub feature can't fix it because it's all-or-nothing and would break gh auth.

Required setup before this runs

One repo/org secret is needed:

ANTHROPIC_API_KEY — Anthropic API key for the model.

The workflow uses the default GITHUB_TOKEN (scoped to contents: read, actions: read, issues: write) — no PAT required.

Optionally create a flaky-test label so the issues are grouped; the workflow falls back to no label if it doesn't exist.