SWE-agent
📣 News: mini, the 100 line AI agent that still gets 65% on SWE-bench verified!
📣 New benchmark: CodeClash (website, github) evaluates SWE agents on goals, not tasks
Software engineering agents, benchmarks, and models.
Built and maintained by researchers from Princeton University and Stanford University.
More information about the projects
Main projects:
- SWE-agent, a system that automatically solves GitHub issues using an LM agent.
- mini-SWE-agent, a 100 line AI agent that still gets 65% on SWE-bench verified!
- SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
- SWE-smith, a toolkit for generating SWE training data at scale.
Also check out the supporting infrastructure for working with SWE-* projects
Pinned Loading
Repositories
Showing 10 of 10 repositories
-
SWE-agent Public
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
-
mini-swe-agent Public
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!
-
SWE-ReX Public
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
-
minimal-agent-tutorial Public
Tutorial on how to build a minimal software engineering agent that still scores high on SWE-bench verified
SWE-agent/minimal-agent-tutorial’s past year of commit activity -
swe-agent-media Public
Hosting ground for readme media/videos of all projects