SWE-bench Verified ported to the CUBE protocol — 500 human-validated GitHub issues with test-based resolution criteria. Princeton + OpenAI's curated subset of the broader SWE-bench dataset where every task was manually checked for an unambiguous problem statement and a reliable test-based reward signal. The agent receives the problem statement + a git checkout at the base commit and must produce a patch that makes the upstream fail_to_pass tests pass without breaking pass_to_pass.
By: @NicolasAG (Nicolas Gontier) , @recursix (Alexandre Lacoste) , @josancamon19 (Joan Cabezas)
Install
pip install swebench-verified-cube
Version: 0.1.0 · PyPI page
Feature Flags
Legal
Reproducibility journal
How to submit →This is a reproducibility journal — not a leaderboard.
Submissions document how reference agents and models score over time, across infrastructures, cube versions, and package versions. Use it to detect drift and validate environments. Not a place to publish a new agent or fine-tune to "win" — there is no ranking, scores are self-reported, and submissions are unverified. To showcase a new agent or model, use ATLAS / EEE / your own benchmark page.
No submissions yet. Be the first — see how to submit.
Registry Entry (YAML)
id: swebench-verified
name: "SWE-bench Verified"
version: "0.1.0"
description: >
SWE-bench Verified ported to the CUBE protocol — 500 human-validated
GitHub issues with test-based resolution criteria. Princeton + OpenAI's
curated subset of the broader SWE-bench dataset where every task was
manually checked for an unambiguous problem statement and a reliable
test-based reward signal. The agent receives the problem statement +
a git checkout at the base commit and must produce a patch that makes
the upstream fail_to_pass tests pass without breaking pass_to_pass.
package: swebench-verified-cube
dev_install_url: "git+https://github.com/The-AI-Alliance/cube-harness#subdirectory=cubes/swebench-verified-cube"
authors:
- github: NicolasAG
name: Nicolas Gontier
- github: recursix
name: Alexandre Lacoste
- github: josancamon19
name: Joan Cabezas
legal:
wrapper_license: MIT
benchmark_license:
reported: MIT
source_url: "https://github.com/SWE-bench/SWE-bench/blob/main/LICENSE"
verified_by_original_authors: false
paper: "https://arxiv.org/abs/2310.06770"
getting_started_url: "https://openai.com/index/introducing-swe-bench-verified/"
tags:
- coding
status: degraded
resources: []
task_count: 500
has_debug_task: true
has_debug_agent: true
action_space: []
features:
async: false
streaming: false
multi_agent: false
multi_dim_reward: false