ci: Add weekly flaky test detector workflow by sl0thentr0py · Pull Request #6484 · getsentry/sentry-python
Summary
Adds a weekly scheduled workflow (.github/workflows/flaky-test-detector.yml) that runs every Wednesday at 08:00 UTC (also manually via workflow_dispatch). It:
- (shell step) Inspects the last ~30
test.yml/ci.ymlruns onmasterviagh run list/gh run view --log-failedand writes the failed logs to./ci-logs/. - (Claude step) Classifies genuinely flaky tests (intermittent failures, timing/ordering/network/global-state signals) vs. real regressions and infra noise — capped at the 5 clearest, ranked by impact — and writes a summary to
flaky-issue-body.md. - (shell step) Opens one summary issue from that file listing each flaky test with its node ID, evidence (run IDs / failure frequency), a one-line root cause, and a short suggested fix.
It does not edit code or open a PR; the output is purely an informational issue for humans to triage.
Security
CI failure logs contain tracebacks and stdout controlled by whoever landed the commit, so they're untrusted input. The "treat logs as data" prompt is treated as defeatable; the real protections are mechanical and keep the log-reading agent away from any credentialed write channel:
- The three steps are kept separate on purpose. Log fetching (step 1) and issue creation (step 3) are plain non-LLM shell steps; only they touch a token.
- The Claude step has no shell and no write token. Its
--allowedToolsisRead,Glob,Grep,Write,TodoWrite— noBash. With no subprocess and no network tool it cannot rungh/curl/printenv, so a prompt injection in the logs cannot exfiltrateANTHROPIC_API_KEYorGITHUB_TOKEN, nor create an issue directly. It can only read the pre-fetched logs and write the issue body to a file. issues: writelives only in the final shell step, which never ingests untrusted log text. All other permissions are read-only.
This avoids the single-step gh pitfall: giving the untrusted-data agent a gh write channel is itself an exfiltration vector (gh issue create --body "$ANTHROPIC_API_KEY"), and the subprocess env-scrub feature can't fix it because it's all-or-nothing and would break gh auth.
Required setup before this runs
One repo/org secret is needed:
ANTHROPIC_API_KEY— Anthropic API key for the model.
The workflow uses the default GITHUB_TOKEN (scoped to contents: read, actions: read, issues: write) — no PAT required.
Optionally create a flaky-test label so the issues are grouped; the workflow falls back to no label if it doesn't exist.