SWE-bench Live ported to the CUBE protocol — 1,895 continuously-updated, contamination-resistant GitHub issue resolution tasks across many open-source repositories. Each task pairs a real issue with its merged fix; the agent receives the problem statement plus a git checkout at the base commit and must produce a patch that makes the upstream fail_to_pass tests pass without breaking pass_to_pass. The task pool is refreshed continuously, making the benchmark useful for testing contamination resistance.
By: @NicolasAG (Nicolas Gontier) , @recursix (Alexandre Lacoste) , @josancamon19 (Joan Cabezas)
Install
pip install swebench-live-cube
Version: 0.1.0 · PyPI page
Feature Flags
Legal
Reproducibility journal
How to submit →This is a reproducibility journal — not a leaderboard.
Submissions document how reference agents and models score over time, across infrastructures, cube versions, and package versions. Use it to detect drift and validate environments. Not a place to publish a new agent or fine-tune to "win" — there is no ranking, scores are self-reported, and submissions are unverified. To showcase a new agent or model, use ATLAS / EEE / your own benchmark page.
No submissions yet. Be the first — see how to submit.
Registry Entry (YAML)
id: swebench-live
name: "SWE-bench Live"
version: "0.1.0"
description: >
SWE-bench Live ported to the CUBE protocol — 1,895 continuously-updated,
contamination-resistant GitHub issue resolution tasks across many
open-source repositories. Each task pairs a real issue with its merged
fix; the agent receives the problem statement plus a git checkout at the
base commit and must produce a patch that makes the upstream
fail_to_pass tests pass without breaking pass_to_pass. The task pool
is refreshed continuously, making the benchmark useful for testing
contamination resistance.
package: swebench-live-cube
dev_install_url: "git+https://github.com/The-AI-Alliance/cube-harness#subdirectory=cubes/swebench-live-cube"
authors:
- github: NicolasAG
name: Nicolas Gontier
- github: recursix
name: Alexandre Lacoste
- github: josancamon19
name: Joan Cabezas
legal:
wrapper_license: MIT
benchmark_license:
reported: MIT
source_url: "https://github.com/microsoft/SWE-bench-Live/blob/main/LICENSE"
verified_by_original_authors: false
paper: "https://arxiv.org/abs/2505.23419"
getting_started_url: "https://swe-bench-live.github.io/"
tags:
- coding
- science
status: degraded
resources: []
task_count: 1895
has_debug_task: true
has_debug_agent: true
action_space: []
features:
async: false
streaming: false
multi_agent: false
multi_dim_reward: false