CUBE Registry 7 benchmarks
CUBE — Common Unified Benchmark Environment

Any CUBE-compliant evaluation platform can discover and run these benchmarks without custom integration.

pip install <package> → ready to evaluate
Benchmark Version Tags Tasks Infra Debug License Status
MiniWob++

miniwob

1.0.0
web gui
125 local MIT Degraded Details →
OSWorld

osworld

0.2.0
os gui desktop multimodal
368 aws CC-BY-4.0 Degraded Details →
SWE-bench Live

swebench-live

0.1.0
coding science
1895 local MIT Degraded Details →
SWE-bench Verified

swebench-verified

0.1.0
coding
500 local MIT Degraded Details →
Terminal-Bench 2

terminalbench2

0.1.0
coding os
89 local Apache-2.0 Degraded Details →
WebArena Verified

webarena-verified

1.0.0
web gui
812 local Apache-2.0 Active Details →
WorkArena

workarena

1.0.0
web gui
333 local Apache-2.0 Degraded Details →