SWE-bench Live

swebench-live · v0.1.0

coding science

SWE-bench Live ported to the CUBE protocol — 1,895 continuously-updated, contamination-resistant GitHub issue resolution tasks across many open-source repositories. Each task pairs a real issue with its merged fix; the agent receives the problem statement plus a git checkout at the base commit and must produce a patch that makes the upstream fail_to_pass tests pass without breaking pass_to_pass. The task pool is refreshed continuously, making the benchmark useful for testing contamination resistance.

By: @NicolasAG (Nicolas Gontier) , @recursix (Alexandre Lacoste) , @josancamon19 (Joan Cabezas)

Install

pip install swebench-live-cube

Version: 0.1.0 · PyPI page

1895

Tasks

local

Infra

Yes

Debug Task

Yes

Debug Agent

Feature Flags

— async

— streaming

— multi_agent

— multi_dim_reward

Legal

Wrapper license MIT

Benchmark license

MIT Self-reported — verify before commercial use Source →

License information is self-reported by the cube developer and has not been verified by the AI Alliance. Always consult the source URL and original benchmark authors for authoritative terms.

Slow check not yet run. Stress test results will appear here after the async compliance check completes.

Reproducibility journal

How to submit →

This is a reproducibility journal — not a leaderboard.

Submissions document how reference agents and models score over time, across infrastructures, cube versions, and package versions. Use it to detect drift and validate environments. Not a place to publish a new agent or fine-tune to "win" — there is no ranking, scores are self-reported, and submissions are unverified. To showcase a new agent or model, use ATLAS / EEE / your own benchmark page.

No submissions yet. Be the first — see how to submit.

Registry Entry (YAML)

View on GitHub →

id: swebench-live
name: "SWE-bench Live"
version: "0.1.0"
description: >
  SWE-bench Live ported to the CUBE protocol — 1,895 continuously-updated,
  contamination-resistant GitHub issue resolution tasks across many
  open-source repositories. Each task pairs a real issue with its merged
  fix; the agent receives the problem statement plus a git checkout at the
  base commit and must produce a patch that makes the upstream
  fail_to_pass tests pass without breaking pass_to_pass. The task pool
  is refreshed continuously, making the benchmark useful for testing
  contamination resistance.
package: swebench-live-cube
dev_install_url: "git+https://github.com/The-AI-Alliance/cube-harness#subdirectory=cubes/swebench-live-cube"

authors:
- github: NicolasAG
  name: Nicolas Gontier
- github: recursix
  name: Alexandre Lacoste
- github: josancamon19
  name: Joan Cabezas

legal:
  wrapper_license: MIT
  benchmark_license:
    reported: MIT
    source_url: "https://github.com/microsoft/SWE-bench-Live/blob/main/LICENSE"
    verified_by_original_authors: false

paper: "https://arxiv.org/abs/2505.23419"
getting_started_url: "https://swe-bench-live.github.io/"
tags:
- coding
- science
status: degraded
resources: []
task_count: 1895
has_debug_task: true
has_debug_agent: true
action_space: []
features:
  async: false
  streaming: false
  multi_agent: false
  multi_dim_reward: false