CUBE Registry 4 benchmarks
CUBE — Common Unified Benchmark Environment

Any CUBE-compliant evaluation platform can discover and run these benchmarks without custom integration.

pip install <package> → ready to evaluate
Benchmark Version Tags Tasks Infra Debug License Status
MiniWob++

miniwob

1.0.0
web gui
125 local MIT Active Details →
OSWorld

osworld

0.2.0
os gui desktop multimodal
368 aws CC-BY-4.0 Active Details →
WebArena Verified

webarena-verified

1.0.0
web gui
812 local Apache-2.0 Active Details →
WorkArena

workarena

1.0.0
web gui
333 local Apache-2.0 Active Details →